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Abstract 

The so called "cogen approach" to program specialisation, writing a compiler generator 
instead of a specialiser, has been used with considerable success in partial evaluation 
of both functional and imperative languages. This paper demonstrates that the cogen 
approach is also applicable to the specialisation of logic programs (called partial deduction) 
and leads to effective specialisers. Moreover, using good binding-time annotations, the 
speed-ups of the specialised programs are comparable to the speed-ups obtained with 
online specialisers. 

The paper first develops a generic approach to offline partial deduction and then a 
specific offline partial deduction method, leading to the offline system lix for pure logic 
programs. While this is a usable specialiser by itself, it is used to develop the cogen system 
LOGEN. Given a program, a specification of what inputs will be static, and an annotation 
specifying which calls should be unfolded, LOGEN generates a specialised specialiser for 
the program at hand. Running this specialiser with particular values for the static inputs 
results in the specialised program. While this requires two steps instead of one, the effi- 
ciency of the specialisation process is improved in situations where the same program is 
specialised multiple times. 

The paper also presents and evaluates an automatic binding-time analysis that is able 
to derive the annotations. While the derived annotations are still suboptimal compared 
to hand-crafted ones, they enable non-expert users to use the LOGEN system in a fully 
automated way. 

Finally, LOGEN is extended so as to directly support a large part of Prolog's declarative 
and non-declarative features and so as to be able to perform so called mixline specialisa- 
tions. 

Keywords Partial evaluation, partial deduction, program specialisation, compiler gen- 
eration, abstract interpretation 



1 Introduction 

Partial evaluation has over the past decade received considerable attention both 
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in functional (e.g. (Jones, Gomard and Sestoft 1993)), imperative (e.g. (Andersen 
1994)) and logic programming (e.g. (Gallagher 1993, Komorowski 1992, Pettorossi 
and Proietti 1994)). Partial evaluators are also sometimes called mix, as they usu- 
ally perform a mixture of evaluation and code generation steps. In the context of 
pure logic programs, partial evaluation is sometimes referred to as partial deduc- 
tion, the term partial evaluation being reserved for the treatment of impure logic 
programs. 

Guided by the Futamura projections (Futamura 1971) a lot of effort, specially 
in the functional partial evaluation community, has been put into making systems 
self- applicable. A partial evaluation or deduction system is called self- applicable 
if it is able to effectively 1 specialise itself. In that case one may, according to the 
second Futamura projection, obtain compilers from interpreters and, according to 
the third Futamura projection, a compiler generator (cogen for short). In essence, 
given a particular program P, a cogen generates a specialised specialiser for P. If 
P is an interpreter a cogen thus generates a compiler. 

However writing an effectively self-applicable specialiser is a non-trivial task — 
the more features one uses in writing the specialiser the more complex the special- 
isation process becomes, because the specialiser then has to handle these features 
as well. This is why so far no partial evaluator for full Prolog (like MIXTUS (Sahlin 
1993), or paddy (Prestwich 1992)) is effectively self-applicable. On the other hand a 
partial deducer which specialises only purely declarative logic programs (like SAGE 
(Gurr 1994) or the system in (Bondorf, Frauendorf and Richter 1990)) has itself to 
be written purely dcclarativcly leading to slow systems and impractical compilers 
and compiler generators. 

So far the only practical compilers and compiler generators for logic programs 
have been obtained by (Fujita and Furukawa 1988) and (Mogensen and Bondorf 
1992). However, the specialisation in (Fujita and Furukawa 1988) is incorrect with 
respect to some extra-logical built-ins, leading to incorrect results when attempting 
self-application (Bondorf et al. 1990). The partial evaluator logimix (Mogensen 
and Bondorf 1992) does not share this problem, but gives only modest speedups 
when self-applied (compared to results for functional programming languages; see 
(Mogensen and Bondorf 1992)) and cannot handle partially static data. 

However, the actual creation of the cogen according to the third Futamura pro- 
jection is not of much interest to users since cogen can be generated once and for all 
when a specialiser is given. Therefore, from a user's point of view, whether a cogen 
is produced by self-application or not is of little importance; what is important is 
that it exists and that it is efficient and produces efficient, non-trivial specialised 
specialisers. This is the background behind the approach to program specialisa- 
tion called the cogen approach (as opposed to the more traditional mix approach): 
instead of trying to write a partial evaluation system mix which is neither too inef- 
ficient nor too difficult to self-apply one simply writes a compiler generator directly. 
This is not as difficult as one might imagine at first sight: basically the cogen turns 

1 This implies some efficiency considerations, e.g. the system has to terminate within reasonable 
time constraints, using an appropriate amount of memory. 
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out to be just a simple extension of a "binding-time analysis" for logic programs 
(something first discovered for functional languages in (Hoist 1989) and then ex- 
ploited in, e.g., (Hoist and Launchbury 1991, Birkcdal and Welindcr 1994, Andersen 
1994, Gliick and J0rgensen 1995, Thiemann 1996)). 

In this paper we will describe the first cogen written in this way for a logic 
programming language. We start out with a cogen for a small subset of Prolog 
and progressively improve it to handle a large part of Prolog and to extend its 
capabilities. 

Although the Futamura projections focus on how to generate a compiler from 
an interpreter, the projections of course also apply when we replace the interpreter 
by some other program. In this case the program produced by the second Futa- 
mura projection is not called a compiler, but a generating extension. The program 
produced by the third Futamura projection could rightly be called a generating 
extension generator or gengen, but we will stick to the more conventional cogen. 

The main contributions of this work are: 

1. A formal specification of the concept of binding-time analysis and more gener- 
ally binding-type analysis, allowing the treatment of partially static structures, 
in a (pure) logic programming setting and a description of how to obtain a 
generic procedure for offline partial deduction from such an analysis. 

2. Based upon point 1, the first description of an efficient, handwritten compiler 
generator (cogen) for a logic programming language, which has — exactly as 
for other handwritten cogens for other programming paradigms — a quite 
elegant and natural structure. 

3. A way to handle both extra-logical features (such as var/l) and side-effects 
(such as print/l) within the cogen. A refined treatment of the call/1 predicate 
is also presented. 

4. How to handle negation, disjunction and the if-then-else conditional in the 
cogen. 

5. Experimental results showing the efficiency of the cogen, the generating ex- 
tensions, and also of the specialised programs. 

6. A method to obtain a binding- type analysis through the exploitation of ex- 
isting termination analysers. 

This paper is a much extended and revised version of (J0rgensen and Leuschel 
1996): points 3, 4, 5, 6 and the partially static structures of point 1 are new, leading 
to a more powerful and practically useful cogen. 

The paper is organised as follows: In Section 2 we formalise the concept of off-line 
partial deduction and the associated binding-type analysis. In Section 3 we present 
and explain our cogen approach in a pure logic programming setting, starting from 
the structure of the generating extensions. In Section 4 we discuss the treatment 
of declarative and non-declarative built-ins as well as constructs such as negations, 
conditionals, and disjunctions. In Section 5 we present experimental results under- 
lining the efficiency of the cogen and of the generating extensions it produces. We 
also compare the results against a traditional offline specialiser. In Section 6 we 
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present a method for doing an automatic binding-type analysis. We evaluate the 
efficiency and quality of this approach using some experiments. We conclude with 
some discussions of related and future work in Section 7. 

2 Off-line Partial Deduction 

Throughout this paper, we suppose familiarity with basic notions in logic pro- 
gramming. We follow the notational conventions of (Lloyd 1987). In particular, in 
programs, we denote variables through strings starting with an upper-case symbol, 
while the notations of constants, functions and predicates begin with a lower-case 
character. 

2.1 A Generic Partial Deduction Method 

We start off by presenting a general procedure for performing partial deduction. 
More details on partial deduction and how to control it can be found, e.g., in 
(Leuschel and Bruynooghe 2002). 

Given a logic program P and a goal G, partial deduction produces a new program 
P' which is P "specialised" to the goal G; the aim being that the specialised program 
P' is more efficient than the original program P for all goals which are instances of 
G. The underlying technique of partial deduction is to construct finite, non-trivial 
but possibly incomplete SLDNF-trees. (A trivial SLDNF-tree is one in which no 
literal in the root has been selected for resolution, while an incomplete SLDNF- 
tree is a SLDNF-tree which, in addition to success and failure leaves, may also 
contain leaves where no literal has been selected for a further derivation step.) The 
derivation steps in these SLDNF-trees correspond to the computation steps which 
have already been performed by the partial deducer and the clauses of the specialised 
program are then extracted from these trees by constructing one specialised clause 
(called a resultant) per non-failing branch. These SLDNF-trees and resultants are 
obtained as follows. 

Definition 1 

An unfolding rule is a function which, given a program P and a goal G, returns a 
non-trivial and possibly incomplete SLDNF-tree for P U {G}. 

Definition 2 

Let P be a normal program and A an atom. Let r be a finite, incomplete SLDNF- 
tree for P U {+— A}. Let <— Gi, . . . , <— G„ be the goals in the leaves of the non- 
failing branches of r. Let 6\, . . . , 6 n be the computed answers of the derivations from 
<— A to <— Gi, G n respectively. Then the set of resultants, resultants(r), is 

defined to be the set of clauses {A9\ <— Gi, . . . , A6 n <— G„}. We also define the set 
of leaves, leaves(r), to be the atoms occurring in the goals Gi, . . . , G„. 

Partial deduction uses the resultants for a given set of atoms S to construct the 
specialised program (and for each atom in S a different specialised predicate defi- 
nition will be generated). Under the conditions stated in (Lloyd and Shepherdson 



Offline Specialisation in Prolog 



■5 



1991), namely closedness (all leaves are an instance of an atom in S) and indepen- 
dence (no two atoms in S have a common instance), correctness of the specialised 
program is guaranteed. 

In most practical approaches independence is ensured by using a renaming trans- 
formation which maps dependent atoms to new predicate symbols. Adapted cor- 
rectness results can be found in (Bcnkcrimi and Hill 1993, Leuschel, Martens and 
De Schreye 1998) and (Leuschel, De Schreye and de Waal 1996). Renaming is often 
combined with argument filtering to improve the efficiency of the specialised pro- 
gram; see e.g. (Gallagher and Bruynooghe 1990, Bcnkcrimi and Hill 1993, Leuschel 
and S0rensen 1996). 

Closedness can be ensured by using the following outline of a partial deduction 
procedure, similar to the ones used in e.g. (Gallagher 1991, Gallagher 1993, Leuschel 
and De Schreye 1998). 

Procedure 1 (Partial deduction) 

Input: a program P and an initial set So of atoms to be specialised 
Output: a set of atoms S 
Initialisation: S new := generalise^) 
repeat 

Sold '— Snew 

Snew ■= {s n \ s n £ leaves(unfold(P, s a ))A s £ S i d } 

Snew 

:= generalise(S id U S new ) 
until S id = S new (modulo variable renaming) 
output S := S new 

The above procedure is parametrised by an unfolding rule unfold and an general- 
isation operation generalise. The latter can be used to ensure termination and can 
be formally defined as follows. 

Definition 3 

An generalisation operation is a function generalise from sets of atoms to sets of 
atoms such that, for any finite set of atoms S, generalise(S) is a finite set of atoms 
S' using the same predicates as those in S, and every atom in S is an instance of 
an atom in S'. 

If Procedure 1 terminates then the closedness condition is satisfied. Finally, note 
that, two sets of atoms Si and S2 are said to be identical modulo variable renaming 
if for every si £ Si there exists S2 £ S2 such that Si and S2 are variants, and vice 
versa. 

2.2 Off-Line Partial Deduction and Binding-Types 

In Procedure 1 one can distinguish between two different levels of control. The 
unfolding rule U controls the construction of the incomplete SLDNF-trees. This is 
called the local control (Gallagher 1993, Martens and Gallagher 1995). The gen- 
eralisation operation controls the construction of the set of atoms for which such 
SLDNF-trees are built. We will refer to this aspect as the global control. 
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The control problems have been tackled from two different angles: the so-called 
off-line versus on-line approaches. The on-line approach performs all the control 
decisions during the actual specialisation phase. The off-line approach on the other 
hand performs an analysis phase prior to the actual specialisation phase, based on 
a description of what kinds of specialisations will be required. This analysis phase 
provides annotations which then guide the specialisation phase proper, often to the 
point of making it almost trivial. 

Partial evaluation of functional programs (Consel and Danvy 1993, Jones et al. 
1993) has mainly stressed off-line approaches, while supercompilation of functional 
(Turchin 1986, S0rensen and Gliick 1995) and partial deduction of logic programs 
(Gallagher and Bruynooghe 1991, Sahlin 1993, Bol 1993, Bruynooghe, De Schreye 
and Martens 1992, Martens and De Schreye 1996, Martens and Gallagher 1995, 
Leuschel et al. 1998, De Schreye, Gliick, J0rgensen, Leuschel, Martens and S0rensen 
1999) have mainly concentrated on on-line control. 

An initial motivation for using the off-line approach was to achieve effective self- 
application (Jones, Sestoft and S0ndergaard 1989.). But the off-line approach is in 
general also much more efficient since many decisions concerning control are made 
before and not during specialisation. This is especially true in a setting where the 
same program is re-specialised several times. (Note, however, that the global control 
is usually not done in a fully offline fashion: almost all offline partial evaluators 
maintain during specialisation a list of calls that have been previously specialised 
or arc pending (Jones et al. 1993).) 

Most off-line approaches perform what is called a binding-time analysis (BTAj 
prior to the specialisation phase. The purpose of this analysis is to figure out which 
values will be known at specialisation time proper and which values will only be 
known at runtime. The simplest approach is to classify arguments within the pro- 
gram to be specialised as either static or dynamic. The value of a static argument 
will be definitely known (bound) at specialisation time whereas a dynamic argument 
is not necessarily known at specialisation time. In the context of partial deduction 
of logic programs, a static argument can be seen (Mogensen and Bondorf 1992) as 
being a term which is guaranteed not to be more instantiated at run-time (it can 
never be less instantiated at run-time; otherwise the information provided would 
be incorrect). For example if we specialise a program for all instances of p(a,X) 
then the first argument to p is static while the second one is dynamic 

This approach is successful for functional programs, but often proves to be too 
weak for logic programs: in logic programming partially instantiated data struc- 
tures appear naturally even at runtime. A simple classification of arguments into 
"fully known" or "totally unknown" is therefore unsatisfactory and would prevent 
specialising a lot of "natural" logic programs such as the vanilla metainterpreter 
(Hill and Gallagher 1998, Martens and De Schreye 1995) or most of the benchmarks 
from the DPPD library (Leuschel 1996-2000). 

The basic idea to improve upon the above shortcoming, is to describe parts of 
arguments which will actually be known at specialisation time by a special form of 
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types. 2 Below, we will develop the first such description, of what we call binding- 
types, in logic programming. 

Binding- Types 

In logic programming, a type can be defined as a set of terms closed under substitu- 
tion (Apt and Marchiori 1994). We will stick to this view and adapt the definitions 
and concepts of (Yardeni, Fruhwirth and Shapiro 1992) (which mainly follow the 
Hilog notation (Chen, Kifcr and Warren 1989)). 

As is common in polymorphically typed languages (e.g. (Somogyi et al. 1996)), 
types are are built up from type variables and type constructors in much the same 
way as terms are built-up from ordinary variables and function symbols. Formally, a 
type is either a type variable or a type constructor of arity n > applied to n types. 
We presuppose the existence of three 0-ary type constructors: static, dynamic, and 
nonvar. These constructors will be given a pre-defined meaning below. Also, a type 
which contains no variables is called ground. 

Definition 4 

A type definition for a type constructor c of arity n is of the form 

c(V u ..., V n ) h(Tl . . . , I? 1 ) ; ...;/*(!*..., T£>) 

with k > 1, n, m, . . . , nk > and where fi, - ■ ■ ,fk are distinct function symbols, 
Vi,. .. ,V n are distinct type variables, and T- are types which only contain type 
variables in {Vi, . . . ,V n }. 

A type system T is a set of type definitions, exactly one for every type constructor 
c different from static, dynamic, and nonvar. We will refer to the type definition for 
c in T by Def r (c). 

From now on we will suppose that the underlying type system T is fixed. A type 
system Ti, defining a type constructor for parametric lists, can be defined as follows: 
Ti = {list(T) — > nil ; cons{T, list{T))}. Using the ASCII notations of Mercury 
(Somogyi et al. 1996) and using Prolog's list notation, the type system T\ would 
be written down as follows: 

:- type list(T) — > [ ] ; [T I list(T)]. 

We define type substitutions to be finite sets of the form {Vi/n, . . . , Vfe/r/j}, where 
every Vi is a type variable and n a type. Type substitutions can be applied to types 
(and type definitions) to produce instances in exactly the same way as substitutions 
can be applied to terms. For example, list (V){V/ static} = list(static). A type or 
type definition is called ground if it contains no type variables. 

We now define type judgements relating terms to types in the underlying type 
system T. 

Definition 5 

2 This is somewhat related to the way instantiations arc defined in the Mercury language (Som- 
ogyi, Henderson and Conway 1996). But there are major differences, which we discuss later. 
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Type judgements have the form t : r, where t is a term and t is a type, and are 
inductively defined as follows. 

• t : dynamic holds for any term t 

• t : static holds for any ground term t 

• t : nonvar holds for any non-variable term t 

• f(ti, . . . ,t n ) : c(t[, . . . ,r' k ) holds if there exists a ground instance of the 
type definition Def T {c) in the underlying type system T which has the form 
c(t[ ,r' k ) ► . . . /(n ,T n ) . . . and where ij : r* for 1 < i < n. 

We also say that a type r is more general than another type r' iff whenever t : t' 
then also t : r. 

Note that our definitions guarantee that types are downwards-closed in the sense 
that for all terms t and types r we have t : r =>■ t6 : r. 

Here are a few examples, using the type system Ti above. First, we have s(0) : 
static, s(0) : nonvar, and s(0) : dynamic. Also, s(X) : nonvar, s(X) : dynamic 
but not s(X) : static. For variables we have X : dynamic, but neither X : siaiic 
nor X : nonvar. A few examples with lists (using Prolog's list notation) are as 
follows: [] : list(static), s(0) : static hence [s(0)] : list(static), X : dynamic 
and Y : dynamic hence [X, Y] : list (dynamic). Finally, we have, for example, that 
list(dynamic) is more general than list (static) . 

Binding-Type Analysis and Classification 

We will now formalise the concept of a binding-type analysis (which is an extension 
of a binding-time analysis, as in (J0rgensen and Leuschel 1996)). For that we first 
define the concept of a division which assigns types to arguments of predicates. 

Definition 6 

A division for a predicate p of arity n is an expression of the form p(r\, . . . , r„) 
where each n is a ground type. 

A division for a program P is a set of divisions for predicates in Pred(P), with 

at most one division for any predicate. When there is no ambiguity about the 

underlying program P we will also often simply refer to a division. 

A division is called simple iff it contains only the types static and dynamic. 

A division A is called more general than another division A' iff V p(t[, . . . , T' n ) G A' 

there exists p(t\, . . . , t„) G A such that for 1 < i < n Tj is more general than t[. 

The fact that divisions only use ground types means that we do not cater for 
polymorhpic types, although we can still use parametric types. This simplifies the 
remainder of the presentation (mainly Definition 14) but can probably be lifted. As 
can be seen from the above definition, we restrict ourselves to monovariant divisions 
in this paper. As discussed in (Jones et al. 1993), a way to handle polyvariant 
divisions by a monovariant approach is to "invent sufficiently many versions of each 
predicate." 

Now, a binding-type analysis will, given a program P (and some description of 
how P will be specialised), perform a pre-processing analysis and return a single 
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division for every predicate in P describing the part of the values that will be known 
at specialisation time. It will also return an annotation which will then guide the 
local unfolding process of the actual partial deduction. For the time being, an 
annotation can simply be seen as a particular unfolding rule U. We will return to 
this in Section 2.3. 

We are now in a position to formally define a binding-type analysis in the context 
of (pure) logic programs: 

Definition 7 

A binding-type analysis {BTA) yields, given a program P and an arbitrary initial 
division Ao for P, a couple (U, A) consisting of an unfolding rule U and a division 
A for P more general than A . We will call the result of a binding-time analysis a 
binding-type classification (BTC). 

The purpose of the initial division Ao is to give information about how the 
program will be specialised: it specifies what form the initial atom(s) (i.e., the ones 
in (So of Procedure 1) can take. The role of A is to give information about the 
atoms and their binding types that can occur at the global level (i.e., the ones 
in S new and S id of Procedure 1). In that light, not all BTC are correct and we 
have to develop a safety criterion. Basically a BTC is safe iff every atom that can 
potentially appear in one of the sets S new of Procedure 1 (given the restrictions 
imposed by the annotation of the BTA) corresponds to the patterns described by 
A. 3 

We first define a safety notion for atoms and goals. 
Definition 8 

Let P be a program and let A be a division for P and let p(ti, . . . , t n ) be an atom. 
Then p(t\, . . . , t n ) is safe wrt A iff 3p(r\, . . . , t„) G A such that V« G {1, . . . , n} we 
have U : T{. A set of atoms S is safe wrt A iff every atom in S is safe wrt A. Also 
a goal G is safe wrt A iff all the atoms occurring in G are safe wrt A. 

For example p(a, X) and <— p(a, a),p(b, c) are safe wrt A = {p(static, dynamic)} 
while p(X, a) is not. 

Definition 9 

Let (3 = (U, A) be a BTC for a program P. Then (3 is a globally safe BTC for P 
iff for every goal G which is safe wrt A, U(P, G) is an SLDNF-trcc r for P U {G} 
whose leaf goals are safe wrt A. A BTA is globally safe if for any program P it 
produces a globally safe BTC for P. 

Sometimes — in order to simplify both the partial deducer and the BTA — one 
might want to generalise atoms and then lift them to the global level (i.e., S new in 
Procedure 1) before the full SLDNF-trcc r has been built, namely at the point where 
a left-to-right selection rule would have selected the atom. This is the motivation 
behind the following notion of a strongly globally safe BTC. 

3 Our safety condition differs somewhat from the classical uniform congruence requirement 
(Launchbury 1991, Jones et al. 1993). Wc discuss this difference in Section 7.1. 
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Fig. 1. Different types of safety for a sample, incomplete SLD-tree 
Definition 10 

Let [3 — (U, A) be a BTC for a program P. Then f3 is a strongly globally safe BTC 
for P iff it is globally safe for P and for every goal G which is safe wrt A, U(P, G) 
is an SLDNF-tree such that the literals to the left of selected literals are also safe 
wrt A. 

Notice, that in the above definitions of safety no requirement is made about the 
actual atoms selected by U. Indeed, contrary to functional or imperative program- 
ming languages, definite logic programs can handle uninstantiated variables and 
a positive atom can always be selected. Nonetheless, if we have negative literals 
or Prolog built-ins, this is no longer true. For example, X is Y + 1 can only be 
selected if Y is ground. Put in other terms, we can only select a call "s is t" if 
it is safe wrt {is(dynamic, static)}. Also, we might want to restrict unfolding of 
user-defined predicates to cases where only one clause matches. For example, we 
might want to unfold a call app(r,s,t) (see Example 1 below) only if it is safe 
wrt {app(static, dynamic, dynamic)}. This motivates the next definition, which can 
be used to ensure that only properly instantiated calls to built-ins and atoms are 
selected. 

Definition 11 

A BTC [3 = (U, A) is locally safe for P iff for every goal G which is safe wrt A, 
U(P, G) is an SLDNF-tree for P U {G} where all selected literals are safe wrt A. 

The difference between local and global safety is illustrated in Figure 1. Note 
that it might make sense to use different divisions for local and global safety. This 
can be easily allowed, but we will not do so in the presentation of this article. 

Let us now return to the global control. Definition 9 requires atoms to be safe 
in the leaves of incomplete SLDNF-trees, i.e. at the point where the atoms get 
abstracted and then lifted to the global level. So, in order for Definition 9 to ensure 
safety at all stages of Procedure 1, the particular generalisation operation employed 
should not abstract atoms which are safe wrt A into atoms which are no longer 
safe wrt A. 

This motivates the following definition: 
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An generalisation operation generalise is safe wrt a division A iff for every finite 
set of atoms S which is safe wrt A, generalise(S) is also safe wrt A . 

In particular this means that generalise can only generalise positions marked 
as dynamic or the arguments of positions marked as nonvar within the respec- 
tive binding-type. For example, generalise({p([])}) = {p(X)} is neither safe wrt 
A = {p(static)} nor wrt A' = {p(nonvar)} nor wrt A" = {p(list (dynamic))}, 
but it is safe wrt A'" = {p(dynamic)} . Also, generalise({p(f ([]))}) = {p(f(X))} 
is not safe wrt A = {p(static)} but is safe wrt both A' = {p(nonvar)} and 
A'" = {p(dynamic)} . 

Example 1 

Let P be the well known append program 
app([],L,L) <- 

app([H\X],Y,[H\Z\) <- app(X,Y,Z) 

Let A = {app(static, dynamic, dynamic)} and let U be any unfolding rule. Then 
(U,A) is a globally and locally safe ETC for P. E.g., the goal <— app([a, b], Y, Z) 
is safe wrt A and U can either stop at <— app([b], Y,Z), <— app([], F', Z') or at the 
empty goal □. All of these goals are safe wrt A. More generally, unfolding a goal 
<— app(ti,t2, £3) where t\ is ground (and thus static), leads only to goals whose first 
arguments are ground (static). 

2.3 LIX, a Particular Off-Line Partial Deduction Method 

In this subsection we define a specific off-line partial deduction method which will 
serve as the basis for the cogen developed in the remainder of this paper. For sim- 
plicity, we will, until further notice, restrict ourselves to definite programs. Negation 
will in practice be treated in the cogen either as a built-in or via the if-then-else 
construct (both of which we will discuss later). 

We first define a particular class of simple-minded but effective unfolding rules. 

Definition 13 

An annotation A for a program P marks every literal in the body of each clause of 
P as either reducible or non-reducible. A program P together with an annotation 
A for P is called an annotated program, and is denoted by P4. 
Given an annotation A for P, Ua denotes the unfolding rule which given a goal G 
computes Ua(P, G) by unfolding the leftmost atom in G and then continously un- 
folds leftmost reducible atoms until an SLD-tree is obtained with only non-reducible 
atoms in the leaves. 

Syntactically we represent an annotation for P by underlining the predicate sym- 
bol of reducible literals. 4 

4 In functional programming one usually underlines the non-reducible calls. But in logic program- 
ming underlining a literal is usually used to denote selected literals and therefore underlining 
the reducible calls is more intuitive. 
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Example 2 

Let Pa be the following annotated program 

p(X)<-q(X,Y),q(Y,Z) 
q{a, b) <- 
q(b,a) <- 

Let A = {p(static),q(static, dynamic)}. Then (3 — (Ua, A) is a globally safe PTC 
for P. For example the goal <— p(a) is safe wrt A and unfolding it according to 
Ua will lead (via the intermediate goals <— q(a, Y), q(Y, Z) and <— q(b,Z)) to the 
empty goal □ which is safe wrt A. Note that every selected atom is safe wrt A, 
hence (3 is actually also locally safe for P. Also note that (3' — (Ua 1 , A), where .A' 
marks every literal as non-reducible, is not a safe BTC for P. For instance, given 
the goal <— p(a) the unfolding rule L/4' just performs one unfolding step and thus 
stops at the goal <— q(a, Y), g(Y, Z) which contains the unsafe atom q(Y, Z). 

From now on we will only use unfolding rules of the form Ua obtained from an 
annotation A and our BTAs will thus return results of the form /3 = (Ua, A). 

Given we have a BTC for a program P, in order to arrive at a concrete instance 
of Procedure 1 we now only need a (safe) generalisation operation, which we define 
in the following. 

Definition 14 

We first define a family of mappings gen T from terms to terms, parameterised by 
types, inductiely as follows: 

• 9en static {t) = t, for any term t 

• gen dynamic (t) = V , for any term t and where V is a fresh variable 

• 5 e? W«ar(/(ii: • • • j = ■ • • . K), where Vi,...,V n arc n distinct 
fresh variables 

• 9 en c(r' 1 ,...y k ){f{ t i-, ■ ■ -)*n)) = f(gen Tl (ti), . . . , gen Tn (t n )), if there exists a ground 
instance in Def r (c) of the form c(r{ , . . . ,r' k ) — ► . . . ; /(n, . . . , t„); . . .. 

Let A be a division for some program P. We then define the partial mapping gen A 
from atoms to atoms by: 

• gen A (p(t 1} . . .,t n )) = p(gen Ti (t 1 ), . . .,gen Tn (t n )) if 3 p(n, . . . ,r„) e A such 
that p(ti, ...,t n ): p(n, . . . ,t„). 

We also define the generalisation operation generalise^ as follows: For a set S 
of atoms which is safe w.r.t. A, generalise A (S) is a minimal subset Si of S2 = 
{gen A (s) s e S} such that for every element s of S2 there exists a variant of s in 
Si. 

For example, if A = {p(static, dynamic), q(dynamic, static, nonvar)} we have 
gen A (p(a,b)) = p(a,X) and gen A (q(a,b, f(c))) = q(Y,b, f(Z)). We also have that 
generalise A ({p(a,b), q(a,b, /(c))}) = {p(o, X), g(Y, 6, /(Z))}. 

For A' = {r(list(dynamic))} (where list(dynamic) is defined in Section 2.2) we have 
that gen A (r([a, b, c])) = r([X, Y, Z]) and (?en A (r([ff|T])) is undefined because it is 
not safe w.rt. A. 
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As can be seen gen A is in general not total, but is total for atoms safe wrt A. 
Hence, in the context of a globally safe BTA, gen A and generalise A will always be 
defined. 

Proposition 1 

For every division A, generalise A is safe wrt A. 

Based upon this generalisation operation, we can also define a corresponding 
renaming and filtering operation: 

Definition 15 

Let ||.|| be a fixed mapping from atoms to natural numbers such that \\A\\ = 
\\B\\ iff A and B are variants. We then define filter A as follows: filter A {A) = 
P\\gen & (A)\\ (ViO, . . . , Vfc#), where A = gen A (A)9, p is the predicate symbol of A, 
and V\, . . . , Vk are the variables appearing in gen A (A). 

The purpose of the mapping ||.|| is to assign to every specialised atom (i.e., 
atoms of the form gen A (A)) a unique identifier and predicate name, thus ensuring 
the independence condition (Lloyd and Shepherdson 1991). The filter A operation 
will properly rename instances of these atoms and also filter out static parts, thus 
improving the efficiency of the residual code (Gallagher and Bruynooghe 1990, 
Benkerimi and Hill 1993). For example, given the division A = {p(static, dynamic), 
q(dynamic, static, nonvar)}, \\p(a,X)\\ = 1, and \\q(X,b, f(Y)\\ = 2 we have that 
filter A (p(a,b)) = pi(b) as well as filter A (q(a,b, /(c))) = q 2 {a,c). 

In the remainder of this paper we will use the following off-line partial deduction 
method: 

Procedure 2 (off-line partial deduction) 

1. Perform a globally safe BTA (possibly by hand) returning results of the form 
(£/4,A). 

2. Perform Procedure 1 with Ua as unfolding rule and generalise A as generali- 
sation operation. The initial set of atoms Sq should only contain atoms which 
are safe wrt A. 

3. Construct the specialised program P' using filter A and the output S of 
Procedure 1 as follows: P' = {filter A {A)9 <— filter A {B\), filter A (B n ) \ 
A6 <- B 1 , . . . , B n e resultants {U A (P, A)) A A e S}. 

Proposition 2 

Let {Uai A) be a globally safe BTC for a program P. Let S be a set of atoms safe 
wrt A. Then all sets S new and S id arising during the execution of Procedure 2 are 
safe wrt A. 

Notably, if Procedure 2 terminates then the final set S will be safe wrt A. How- 
ever, none of our notions of safety actually ensure (local or global) termination 
of Procedure 2. Termination is thus another issue (orthogonal to safety) which a 
BTA has to worry about. Basically the annotation A has to be such that for all 
atoms A which are safe wrt A, Ua returns a finite SLDNF-tree r for P U A}. 
Furthermore, A has to be such that generalise A ensures that only finitely many 
atoms can appear at the global level. We will return to this issue in Section 6. 

We now illustrate Procedure 2 on a relatively simple example. 
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Example 3 

We use a small generic parser for a set of languages which are defined by grammars 
of the form N ::= aN\X (where a is a terminal symbol and X is a placeholder 
for a terminal symbol). The example is adapted from (Komorowski 1992) and the 
(annotated) parser P is depicted in Figure 2. The first argument to nont is the 
value for X while the other two arguments represent the string to be parsed as a 
difference list. 

1. Given the initial division Ao = {nont(static, dynamic, dynamic)}, a BTA 
might return (3 = (Uj,, A) with A = {nont(static, dynamic, dynamic), t(static, 
dynamic, dynamic)} and where A is represented in Figure 2. It can be seen 
that (3 is a globally and locally safe ETC for P. 

2. Let us now perform the proper partial deduction for So = {nont(c,T, R)}. 
Note that the atom nont(c,T, R) is safe wrt A (and hence also wrt A). 
Unfolding the atom in Sq yields the SLD-tree in Fig. 3. We see that the 
only atomin the leaves is {nont(c, V, R)} and we obtain S id = S new (modulo 
variable renaming). 

3. The specialised program before and after filtering is depicted in Figure 4. 
Note that, if one wishes to call the filtered version in exactly the same way 
as the unfiltcrcd one has to add the clause nont(c,T, R) <— nont\{T, R). 



nont(X, T, R) <- t(a, T, V), nont(X, V, R) 
nont(X, T, R) <— t(X,T,R) 
t(X, \X\R\,R) «- 

Fig. 2. A very simple parser 



nont(c, T, R) 



t(a,T,V),nont(c,V,R) <-t(c,T,fl) 



T = [a IV] 

<— nontic, V, R) 



T = [cIR] 



Fig. 3. Unfolding the parser of Figure 2 



nont(c, [a\V],R) nont(c,V,R) 
nont(c, [c\R],R) <— 

nontrdalV], R) <- nonti(V,R) 
nonti([c\R], R) <— 

Fig. 4. Unfiltered and filtered specialisation of Figure 2 
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Fig. 5. Overview of the mix approach 



The LIX system 



Based upon Procedure 2 we have implemented a concrete offline partial deduction 
system called lix using the traditional mix approach (Jones et al. 1993) depicted in 
Figure 5. We will examine the power of this system in more detail in Section 5. As 
we will see, provided that a good BTA is used, the quality of the specialised code 
provided by lix can be surprisingly good. As is to be expected, due to its offline 
nature, lix itself is very fast. In the next section, we show how the specialisation 
speed can be further improved by using the cogen approach. 

Now, a crucial aspect for the performance of LIX is of course the quality of the 
BTC . Also, the runtime of an automatic BTA can usually not be neglected, and it 
could be considerably higher than that of Lix. However, in cases where the same 
code is specialised over and over again, the cost of the BTA is much less significant, 
as it only has to be run once. We will return to these issues in Sections 5 and 6. 



Based upon the generic offline partial deduction framework presented in the previ- 
ous section, we will now describe the cogen approach to logic program specialisation. 



In the context of our framework, a generating extension for a program P wrt to a 
given safe BTC (U_a, A) for P, is a program that receives as its only input an atom 
A which is safe wrt A, which it then specialises (using parts 2 and 3 of Procedure 2 
with So — {A}), thereby producing a specialised program Pa- In the particular 
context of Example 3 a generating extension is a program that, when given the safe 
atom nont(c,T,R), produces the residual program shown in Figure 4. 

In this section, we develop the compiler generator logen; it is a program that 
given a program P and a globally safe BTC [3 = (Ua, A) for P, produces a gener- 
ating extension for P wrt (3. 

An overview of the whole process is depicted in Figure 6 (the re, 7, and a sub- 
scripts will be explained in the next section), and also shows the differences with 
the more traditional mix approach presented in Figure 5. As can be seen, P, A, 
and A have been compiled into the generating extension genex 1 ^ A (contributing to 



3 The cogen approach for logic programming 



3.1 General Overview 
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cogen 




genex^ A 





Pa 



Fig. 6. Overview of the cogen approach 



its efficiency and also making it standalone). A generating extension is thus not a 
generic partial evaluator, but a highly specialised one: it can specialise the program 
P only for calls A which are safe wrt A and it can only follow the annotation A. 

To explain and formalise the cogen approach, we will first examine the role and 
structure of generating extensions genex 1 ^ A . Once this is clear we will consider how 
the cogen can generate them. 



3.2 The local control 

The crucial idea for simplicity (and efficiency) of the generating extensions is to 
produce a specific "unfolding" predicate p u for each predicate p/n. Also, for every 
predicate which is susceptible to appear at the global level, we will produce a specific 
"memoisation" predicate p m . 

Let us first consider the local control aspect. This predicate p u has n+1 arguments 
and is tailored towards unfolding calls to p/n. The first n arguments correspond 
to the arguments of the call to p/n which has to be unfolded. The last argument 
will collect the result of the unfolding process. More precisely, p u (ti, t n , B) will 
succeed for each branch of the incomplete SLDNF-tree obtained by applying the 
unfolding rule Ua to p(t\, t n ), whereby it will return in B the atoms in the leaf of 
the branch and also instantiate t\, t n via the composition of mgus of the branch 
(see Figure 7). For atoms which get fully unfolded, the above can be obtained 
very efficiently by simply executing the original predicate definition of p for the 
goal <— p(t\, ...,t n ) (no atoms in the leaves have to be returned because there are 
none). To handle the case of incomplete SLDNF-trees we just have to adapt the 
definition of p so that unfolding of non-reducible atoms can be prevented and the 
corresponding leaf atoms can be collected in the last argument B. 

p(ti,...,t n ) p u (ti,...,t„,B) 



u A . e 



9 U {B/[L u ...,L m ]} 



Fig. 7. Going from p to p u 



All this can be obtained by transforming every clause for p/n into a clause for 
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Pu/(n + 1) in the following manner. To simplify the presentation, we from now on 
use the notation p(t) to represent an atom of the form p{t\, . . . ,t n ) and also p(s, t) 
to represent an atom of the form p(s\, . . . , s n ,t). 

We first define the ternary relation k ~> 7 : a. Intuitively (see Figure 6), n ~> 7 : 
a denotes that the cogen will produce from the annotated literal or conjunction k 
in the original program P the calls 7 in the generating extension genex^ A . In turn, 
the computed answers of 7 will instantiate a to the bodies of the residual clauses 
that are part of the specialised program P$. If 7 fails then no residual clause will 
be produced. On the other hand, if 7 has several computed answers then several 
residual clauses will be produced. 

Definition 16 

The ternary relation k ~» 7 : a, with k denoting annotated conjunctions, 7 de- 
noting conjunctions and a denoting terms, is defined by the following three rules. 
Remember that an underlined literal is selected for unfolding. 

p(t) ^> p u (t, C) : C (C fresh variable) 

p(t) ~> p m (t, C) : C (C fresh variable) 

Mi : m ~> 7, : Oi 
(conjunctions) 

(«!,-• -,Kn) ^ (7l)---)7n) = (^1 , • • • , cr„) 

The above relation can now be used to define the relation ^ u which transforms 
a clause of p into a clause for the efficient unf older p u . 

p(t) <— ^ u p u (i, irwe) <— (facts) 

k ~> 7 : (T 

(rules) 

p(t) <- k p„(i, cr) <- 7 

Given an annotation ^4 and a program P we define P^ — {c' \ c G P A c ^ M c'}. 

Note that the transformation ~> u , by means of the ^ transformation, also gen- 
erates calls to p m predicates which we define later. These predicates take care of 
the global control and also return a filtered and renamed version of the call to be 
specialised as their last argument. 

In the above definition inserting a literal of the form p u (t,C) corresponds to 
further unfolding whereas inserting p m (t, C) corresponds to stopping local unfolding 
and leaving the atom for the global control (something which is also referred to as 
memoisation) . In the case of the program P from Example 3 with A as depicted in 
Figure 2, we get the following program P^, where (Vi,V2) and V\ represent a of 
Definition 16: 

nont_u(X,T,R, (V1.V2)) :- t_u(a,T,V,Vl) ,nont_m(X, V,R, V2) . 
nont.uCX.T.R.Vl) :- t_u(X,T,R,Vl) . 
t_u(X, [X|R] ,R,true) . 
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Suppose for the moment the simplest definition possible for nont_m (i.e., it per- 
forms no global control nor does it filter and rename): 

nontjaCX.V.R.nontCX.V.R)) . 

Evaluating the above code for the call nont_u(c,T,R,Leaves) then yields two com- 
puted answers which correspond to the two branches in Figure 3 and allow us to 
reconstruct the unfiltered specialisation in Figure 4: 

> ?-nont_u(c, T,R. Leaves) . 

T = [a|_A], Leaves = true,nont(c,_A,R) ? ; 
T = [c|R] , Leaves = true ? ; 
no 

3.3 The global control 

As mentioned, the above code for P^ is still incomplete, and we have to extend 
it to perform the global control as well. Firstly, calling p u only returns the leaf 
atoms of one branch of the SLDNF-tree, so we need to add some code that collects 
the information from all the branches. This can be done very easily using Prolog's 
findall predicate. 5 

In essence, f indall(V,Call,Res) finds all the answers 8 t of the call Call, applies 
to V and then instantiates Res to a list containing rcnamings of all the V#j's. In par- 
ticular, findall (B,nont_u(c,T,R,B) ,Bs) instantiates Bs to [ [true ,nont (c,_48,_49)] , 
[true] ] . This essentially corresponds to the leaves of the SLDNF-tree in Figure 3 
(by flattening and removing the true atoms we obtain [nont(c,_48,_49)]). Further- 
more, if we call f indall(clause(nont(c,T,R) ,Bdy) , nont_u(c,T,R,Bdy) , Cs) we will 
get in Cs a representation of the two resultants of Figures 3 and 4 (without filtering) . 

Now, once all the resultants have been generated, the body atoms have to be 
generalised (using gen A ) and then unfolded if they have not been encountered yet. 
This is achieved by re-defining the predicates p m so that they perform the global 
control. That is, for every atom p(t) in the original program, if one calls p m (t, R) 
then R will be instantiated to the residual call of p(t) (i.e. the call after applying 
filter A ; e.g., the residual call of p(a,b, X) might be p\(X)). At the same time p m 
also generalises this call, checks if it has already been encountered, and if not, 
unfolds the atom to produce the corresponding residual code. 

We have the following definition of p m (we denote the Prolog conditional by 
If Then; Else): 

Definition 17 

Let P be a program and p/n be a predicate defined in P. Also, let v be a sequence 
of n distinct variables (one for each argument of p). We then define the clause C£; A 
for p m as follows: 

5 Note that, because our generating extensions do not have to be self-applied, we do not necessarily 
have to specialise the findall predicate itself. 



Offline Specialisation in Prolog 



19 



p m (v,R) :- ( f ind_pattern(p(«) ,R) _> true 
; (generalise (p(v) ,p(ff)) , 
insert_pattern(p(g) ,Hd) , 
f indall(clause(Hd,Bdy) ,p u (g,Bdy) ,Cs) , 
pp(Cs) , 

f ind_pattern(p(tJ) ,R) ) )• 
Finally we define the Prolog program P^ — {C£; A | p 6 Pred(P)}. 

In the above, the predicate find_pattern checks whether its first argument p(v) 
is an instance of a call that has already been specialised (or is in the process of 
being specialised) and, if it is, its second argument will be instantiated to the 
properly renamed and filtered version filter A (p(v)) of the call. This is the classical 
"seen before" check of partial evaluation (Jones et al. 1993) and is achieved by 
keeping a list of the predicates that have been encountered before along with their 
renamed and filtered calls. Thus, if the call to find_pattern succeeds, then R has 
been instantiated to the residual call of p(v), if the call was not seen before then 
the other branch of the conditional is executed. 

The call generalise (p{v) ,p(g)) simply computes p(g) = gen A (p(v)). 

The predicate insert_pattern adds a new atom (its first argument p(g)) to the list 
of atoms already encountered and returns (in its second argument Hd) the renamed 
and filtered version filter A (p(g)) of the generalised atom. The atom Hd will provide 
(maybe further instantiated) the head of the residual clauses. 

This call to insert_pattern is put first to ensure that an atom is not specialised 
over and over again at the global level. 

The call to f indalK clause (Hd, Bdy ) ,p u (g,Bdj) ,Cs) unfolds the generalised atom 
p(g) and returns a list of residual clauses for filter A (p(g)) (in Cs). As we have seen 
in Section 3.2, the call to p u (g~,Bdy) inside this findall returns one leaf goal of 
the SLDNF-tree for p(g) at a time and instantiates p(g) (and thus also Hd) via 
the computed answer substitution of the respective branch. Observe that every 
atom q(v) in the leaf goal has already been renamed and filtered by a call to the 
corresponding predicate q m (v). 

Finally, the predicate pp pretty-prints the clauses of the residual program and 
the last call find.pattern will instantiate the output argument R to the residual call 
filter A (p(v)) of the atom p(v) (which is different from Hd which is filter A (p(jj))) . 

We can now fully define what a generating extension is: 

Definition 18 

Let P be a program and (U_^, A) a strongly globally safe BTC for P, then the 

generating extension of P with respect to (Ua, A) is the Prolog program P g = 
P A u pA 

The generating extension is called as follows: if one wants to specialise an atom 
p(v) one simply calls p m (v,R) . Observe that generalisation and specialisation occur 
as soon as we call p m {v,R), and not after the whole incomplete SLDNF-tree has 
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been built. 6 Together with our particular construction of the unfolder predicates 
(Definition 16) this means that to ensure correctness of specialisation we need to 
have strong global safety instead of just global safety (cf. Definitions 9 and 10). 

There are several ways to improve the definition of a generating extension (Def- 
inition 18). The first improvement relates to the call generalise (p(v) ,p(<?)) which 
computes p(g) = gen A (p(v)). If the division for p in A is simple (i.e., only contains 
static and dynamic) one can actually compute p(g) = gen A (p(v)) beforehand (i.e., 
in the cogen as opposed to in the generating extension), without having to know 
the actual values for the variables in v. This will actually be used by our cogen, 
whenever possible, to further improve the efficiency of the generating extensions. 
For example, if we have A = {p(static, dynamic)} and p(v) = p(X,Y), then the 
cogen does not have to generate a call to generalise/2; it can simply use p(X,Z) 
for p(g), where Z is a fresh variable, within the code for p_m(X,Y,R) . The generating 
extension will thus correctly keep the static values in X and abstract the dynamic 
values in Y. 

Second, in practice it might be unnecessary to define p m for every predicate 
p. Indeed, there might be predicates which are never memoised. Such predicates 
will never appear at the global level, and one can safely remove the corresponding 
definitions for p m from Definition 18. 

For instance, in Example 3 the predicate t/3 is always reducible and never spe- 
cialised immediately by the user. Also, the division is simple, and one can thus 
pre-compute generalise. The resulting, optimised generating extension is shown in 
Figure 8. 

nont_m(B,C,D,FilteredCall) :- 

(find_pattern(nont(B,C,D) .FilteredCall) -> true 
; (insert_pattern(nont(B,F,G) ,FilteredHead) , 
findalK clause (FilteredHead.Body) , 

nont_u(B,F,G,Body) , SpecClauses) , 
pp(SpecClauses) , 

f ind_pattern (nont (B , C , D) , FilteredCall) 

)). 

nont_u (B,C,D,(E,F)) :- t_u(a,C,G,E) .nont.mCB.G.D.F) . 
nont_u(H,I, J,K) : - t_u(H, I , J,K) . 
t_u(L, [L I M] .M.true) . 

Fig. 8. The generating extension for the parser 



3.4 The cogen LOGEN 

The job of the cogen is now quite simple: given a program P and a strongly globally 
safe BTC (i for P, produce a generating extension for P consisting of the two 
parts described above. The code of the essential parts of our cogen, called LOGEN, 

6 It is, however, not very difficult to change the cogen so that it calls p m (v,R) only after the whole 
incomplete SLDNF-tree has been built. 
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is shown in Appendix A. The predicate memo_clause generates the definition of 
the global control m-predicates for each non-reducible predicate of the program 
whereas the predicates unf olcLclause and body take care of translating clauses of 
the original predicate into clauses of the local control u-predicates. Note how the 
second argument of body corresponds to code of the generating extension whereas 
the third argument corresponds to code produced at the next level, i.e. at the level 
of the specialised program. 

3.5 An Example 

We now show that logen is actually powerful enough to satisfactorily specialise 
the vanilla mctainterpreter (a task which has attracted a lot of attention (Cos- 
madopoulos, Sergot and Southwick 1991, Martens and De Schreye 1996, Vanhoof 
and Martens 1997) and is far from trivial). 

Example 4 

The following is the well-known vanilla metainterpreter for the non-ground repre- 
sentation, along with an encoding of the "double append" program: 
demo (true) . 

demo((P St Q)) :- demo(P), demo (q) . 
demo(A) :- dclause (A,Body) , demo (Body) . 

dclause( append ( [] ,L,L) ,true) . 

dclause ( append ( [H I X] ,Y, [H I Z]) .append (X,Y,Z) & true). 
dclause(dapp(X,Y,Z,R) , (append(X,Y,I) & (append(I,Z,R) Si true))). 

Note that in a setting with just the static/dynamic binding types one cannot 
specialise this program in an interesting way, because the argument to demo may 
(and usually will) contain variables. This is why neither (J0rgensen and Leuschcl 
1996) nor (Mogensen and Bondorf 1992) were able to handle this example. We, 
however, can produce the BTC (A, A) with A = {demo(nonvar), dclause{nonvar , 
dynamic)} and where the annotation A is such that every literal but the demo(P) 
call in the second clause is marked as reducible (see underlining above). 

Observe that, to make the BTA simpler, we encode conjunctions in a list-like fash- 
ion within the second argument of dclause as follows: a conjunction A\ A. . .AA n will 
be represented as Ai$z(. . . (A n htrue)). This enables us to separate the conjunction 
skeleton from the individual literals, and allows us to produce an annotation which 
will result in removing all the parsing overhead related to the conjunction skeleton 
but will not unfold potentially recursive literals within the conjunctions. 

The importance of the nonvar annotation is its influence on the generalisation op- 
eration. Indeed, we have g en ^(demo( append (X, [a],Z))) — demo ( append (X, Y, Z)) 
whereas for A' = {demo(dynamic), dclause(dynamic, dynamic)} the generalisa- 
tion operation throws away too much information: g en A > (demo (append (X, [a], Z))) 
= demo(C), resulting in very little specialisation. 

The demo_u unfoldcr predicate generated by the cogen for demo then looks like: 
demo_u(true,true) . 

demo_u(B & C,(D,E)) :- demo_m(B,D) , demo_u(C,E) . 
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demo_u(F, (G,H) ) : - dclause_u(F, I ,G) , demo_u(l,H). 

The specialised code that is produced by the generating extension (after flatten- 
ing) for the call demo(dapp(X,Y, Z, R)) is: 

demo__0 (B , C , D , E) : - demo__l (B , C , F) , demo__l (F , D , E) . 
demo__l( [] ,B,B) . 

demo__l([C|D] ,E, [C|F]) :- demo__l(D,E,F) . 

Observe that specialisation has been successful: all the overhead has been com- 
piled away and demo__l even corresponds to the definition of append. Given the above 
BTC, logen can achieve a similar feat for any object program and query to be 
specialised. As we will see in Section 5 it can do so efficiently. 

Finally, note that the inefficiency of traversing the first argument to dapp twice 
has not been removed. For this, conjunctive partial deduction is needed (De Schreye 
ct al. 1999). 

4 Extending LOGEN 

In this section we will describe how to extend logen to handle logic program- 
ming languages with built-ins and non-declarative features. We will explain these 
extensions for Prolog, but many of the ideas should also carry over to other logic 
programming languages. (Proponents of Mercury and Godel may safely skip all but 
Subsection 4.1.) 

4-1 Declarative primitives 

It is straightforward to extend LOGEN to handle declarative primitives, i.e. built-ins 
such as =/2, is/2 and arg/3, 7 or externally defined user predicates (i.e., predicates 
defined in another file or module, 8 as long as these are declarative). 

The code of these predicates is not available to the cogen and therefore no pred- 
icates to unfold them can be generated. The generating extension can therefore do 
one of two things: 

1. either completely evaluate a call to such primitives (reducible case), 

2. or simply produce a residual call (non-reducible case). 

To achieve this, we simply extend the transformation of Definition 16 with the 
following two rules, where c is a call to a declarative primitive and reducible calls 
are underlined: 

c ~» c : true 
c ~> true : c 

Example 5 

7 E.g., arg/3 can be viewed as being defined by a (possibly infinite) series of facts: arg(l ,h(X) ,X) . , 
arg(l,f (X,Y),X)., arg(2,f (X,Y) , Y) . , ... 

8 Of course, doing a modular binding-time analysis is more difficult than doing an ordinary one, 
but it is possible (Vanhoof 2000) and this is not really our concern here. 
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For instance, we have arg(l, X, A) ~> arg(l, X, A) : true, meaning that the call will be 
executed in the generating extension and nothing has to be done in the specialised 
program. On the other hand, we have arg(N,X,A) ~> true : arg(M, X, A), meaning 
that the call is only executed within the specialised program. Now take the clause: 

p(X,N,A) :- arg(l,X,A) ,arg(N,X,A). 
This clause is transformed (by ~> u ) into the following unfolding clause: 

p.uCX.N.A.argCN.X.A)) :- arg(l,X,A) . 
For A = {p(static, dynamic, dynamic} and for X = f (a,b) the generating extension 
will produce the residual code: 

p_0(N,a) :- arg(N,f(a,b),a). 
while for X = a the call arg(l,a,A) will fail and no code will be produced (i.e., 
failure has already been detected within the generating extension). 

Observe that, while arg(l,a,A) fails in SICStus Prolog, it actually raises an error 
in ISO Prolog. So, in the latter case we actually have to generate a residual clause 
of the form p__0(N, A) :- raise_exception( . . . ) . 

4-2 Problems with non-declarative primitives 

The above two rules could also be used for non-declarative primitives. However, the 
code generated will in general be incorrect, for the following two reasons. 

First, for some calls c to non-declarative primitives c,fail is not equivalent to fail. 
For example, print (a) , fail behaves differently from fail. Predicates p for which 
the conjunctions p(t),fail and fail are not equivalent are termed as "side-effect" in 
(Sahlin 1993). For such predicates the independence on the computation rule does 
not hold. In the context of the Prolog left-to-right computation rule, this means 
that we have to ensure that failure to the right of such a call c does not prevent 
the generation of the residual code for c nor its execution at runtime. For example, 
the clause 

t :- print (a), 2=3 . 

can be specialised to t :- print (a) , fail, but not to t :- fail, print (a) . and 
neither to t :- fail, nor to the empty program. The scheme of Section 4.1 would 
produce the following unfolder predicate, which is incorrect as it produces the empty 
program: 

t_u(print(a)) :- 2=3. 
The second problem are the so called "propagation sensitive" (Sahlin 1993) built- 
ins. For calls c to such built-ins, even though c,fail and fail are equivalent, the 
conjunctions c, X = t and X = t, c are not. One such built-in is var/1: we have, 
e.g., that (var(X),X=a) is not equivalent to (X=a,var (X) ) . Again, independence on 
the computation rule is violated (even though there are no side-effects) , which again 
poses problems for specialisation. Take for example the following clause: 

t(X) :- var(X), X=a . 
The scheme of Section 4.1 would produce the following unfolder predicate: 

t_u(X,var(X)) :- X=a. 
Running this for X uninstantiated will produce the following residual code, which 
is incorrect as it always fails: 
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t(a) :- var(a) . 

To solve this problem we will have to ensure that bindings generated by specialising 
calls to the right of propagation sensitive calls c do not backpropagate (Sahlin 
1993, Prestwich 1992) onto c. In the case above, we have to prevent the binding 
X/a to backpropagate onto the var(X) call. 

In the remainder of this section we show how side-effect and propagation sensitive 
predicates can be dealt with in a rather elegant and still efficient manner in our 
cogen approach. 

4-3 Hiding failure and sensitive bindings 

To see how we can solve our problems, we examine a small example in more detail. 
Take the following program: 

p(X) :- print (X),var(X), q(X) . 

q(a). 

We have that q(X) ~> q_u(X,C) : C, and applying the scheme from Section 4.1 naively, 
we get: 

p_u(X, (print (X) , var (X) ,C) ) :- q_u(X,C) . 
q_u(a,true) . 

For the same reasons as in the above examples this unfolder predicate is incorrect 
(e.g., for X=b the empty program is generated). 

To solve the problem we have to avoid backpropagating the bindings generated 
by q_u(X,C) onto print (X) ,var(X) and ensure that a failure of q_u(X,C) does not 
prevent code being generated for print (X). The solution is to wrap q_u(X,C) into a 
call to findall. Such a call will not instantiate q_u(X,C) and if q_u(X,C) fails this 
will only lead to the third argument of findall being instantiated to an empty list. 
To link up the solutions of the findall with the rest of the unfolding process we use 
an auxiliary predicate make_disjunction. All this leads to the following extra rule, 
to be added to Definition 16, and where calls whose bindings and whose failure 
should be hidden are wrapped into a hide_nf annotation: 

k ~» 7 : a 



hidejnf(K) 

varlist(/t, V), R, V, C fresh variables 

findall ((<T, V),7, R), 
make_disjunction(_R, V, C) 
: C 

The full code of make_disjunction is straightforward and can be found in Ap- 
pendix A. 

One might wonder why in the above solution one just keeps track of the variables 
in k. The reason is that all the variables in 7 or a (in contrast to k) cannot occur 
in the remainder of the clause. 

Note that annotating a call c using hide_nf also prevents right-propagation of 
bindings generated while specialising c. This is not a restriction, because instead of 
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writing hide_nf (a), (3 we can always write hide_nf ((a,/3)) if one wants the instantia- 
tions of a to be propagated onto j3. Furthermore, preventing right-propagations will 
turn out to be useful in the treatment of negations, conditionals, and disjunctions 
below. 

Example 6 

Let us trace the thus extended cogen on another example: 

p(X) :- print(X) , q(X) . 

q(a). 

q(b). 

Let us mark q(X) as reducible and wrap it into a hide_nf () annotation; the exact 
representation of the annotated clause required for logen is: 

ann_clause (1 ,p(X) , (rescall (print (X) ) ,hide_nf (unf old(q(X) ) ) ) ) . 

We now get the following unfolding predicate for p: 

p_u(X, (print(X) ,Disj)) :- 
varlist(q(X) ,Vars) , 

f indall((Code,Vars) , q_u(X,Code) , Cs) , 
make_disjunction(Cs,Vars,Disj) . 

If we run the generating extension we get the residual program (calls to true 
have been removed by the cogen): 

p__0(B) :- print (B), (B = a ; B = b) . 

Instead of generating disjunctions, one could also produce new predicates for 
each disjunction (at least for those cases where argument indexing might be lost 
(Venken and Demoen 1988)). 

4-4 A solution for non-leftmost, non- determinate unfolding 

It is well known that non- leftmost, non-determinate unfolding, while sometimes 
essential for satisfactory propagation of static information, can cause substantial 
slowdowns. Below we show how our new hide_nf annotation can solve this dilemma 
(another solution is conjunctive partial deduction (Leuschel et al. 1996)). 

Example 7 

In the following expensive_predicate(X) is an expensive, but fully declarative pred- 
icate, which for some reason (e.g., termination) we cannot unfold. 

p(X) :- expensive_predicate(X) , q(X) , r(X) . 
q(a) . r(a). 
q(b). r(b). 
q(c). 

If we mark expensive_predicate(X) as non-reducible, and q(X) and r(X) as re- 
ducible we get the following residual program: 

p 0(a) :- expensive_predicate(a) . 

p 0(b) :- expensive_predicate(b) . 
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This residual program has left-propagated the bindings, which is not a problem in 
itself, but potentially duplicates computations and leads to a less efficient residual 
program. A solution to this problem, which still allows one to unfold q(X) (and 
right- propagate the bindings onto r(X)) and r(X) is to wrap them into a hide_nf 
annotation. This is represented as the following annotated clause, where unfold is 
wrapped around calls to be unfolded and rescall is wrapped around non-reducible 
primitives: 

ann_clause(l ,p(X) , (rescall (expensive_predicate (X) ) , 

hide.nf ( (unf old(q(X) ) , unfold (r (X) )))) ) . 

We then get the following residual program: 

p 0(B) :- expensive_predicate (B) , (B = a ; B = b) . 

4-5 Generating correct annotations 

Having solved the problem of left-propagation of failure and bindings, we now just 
have to figure out when hide_nf annotations are actually necessary. In order to 
achieve maximum specialisation and efficiency, one would want to use just the 
minimum number of such annotations which still ensures correctness. 

First, we have to define a new relation \=hide 7 that holds if the code 7 within the 
generating extension cannot fail and cannot instantiate variables in the remainder 
of the generating extension. This relation is defined in Figure 9. This definition can 
actually be kept quite simple because it is intended to be applied to code in the 
generating extension which has a very special form. 

The following modified rule for conjunctions (replacing the corresponding rule in 
Definition 16) ensures that no bindings are left-propagated or side-effects removed. 

~> 7* : °i A impure(Ki) V j > i :\=hide 7j 

(«!,...,«„) (71,..., 7„) : (cti,...,ct„) 

Here impure(Ki) holds if n t contains a call to a side-effect predicate (which has to be 
non-reducible) or to a non-reducible propagation sensitive call. Calls are classified 
as in (Sahlin 1993) (e.g., the property of generating a side-effect propagates up the 
dependency graph). In case we want to prevent backpropagation of bindings on 
expensive predicates as discussed in Section 4.4, then impure(ni) should also hold 
when Ki contains a call to a non-reducible, expensive predicate. 

This modified rule for conjunctions together with Figure 9 can be used to de- 
termine the required hide_nf annotations. For example, the first rule in Figure 9 
actually implies that non-reducible calls never pose a problem and do not have to 
be wrapped into a hide_nf annotation (because they produce the code ji — true 
within the generating extension). 

To further improve specialisation and efficiency one could also introduce addi- 
tional annotations such as nf (k) if only non-failing has to be prevented and hide(«;) 
if only bindings have to be hidden. This is actually done within the implementation 
of the cogen, but, for clarity's sake, we don't elaborate on this here. 
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\=hide true 

Vi : \=hide 7« V* : (=W<ie 7« 

(7i; 72) 

Vi : \=hide 7« hide-nf(n) ~» 7 : a 

|=/ii<ie (71 ► 72; 73) |=hide 7 

Fig. 9. The hide relation 

4-6 Negation and Conditionals 

Prolog's negation (not/1) is handled similarly to a declarative primitive, except that 
for the residual case not(n) we will also specialise the code ft inside the negation 
and we have to make sure that this specialisation (performed by the generating ex- 
tension) cannot fail (otherwise the code generation would be incorrectly prevented) 
or propagate bindings. 

k ~> 7 : true 
not (ft) ~> not (7) : true 

ft ~> 7 : a A h/»de 7 

not(ft) ^> 7 : not(cr) 

The first rule is used when we know that k can be completely and finitely unfolded 
and it can be determined whether k fails or not: if 7 succeeds then the generating 
extension will not generate code, and if 7 fails the generating extension will succeed 
and produce the residual code true for the negation. If we have k ~> 7 : a with a ^ 
true then the annotation was wrong and an error will be raised during specialisation. 
It is thus the responsibility of the BTC to ensure that such errors do not occur. 

If the negation is non-reducible then we require that the generating extension 
does not fail (the hide relation in the premiss). To enable the rule, k must be given 
the hide_nf annotation unless 7 is already hidden. Again, this is the responsibility 
of the BTC. 

Example 8 

Consider the following two annotated clauses. 

p(X) :- not (X=a) . 
q(Y) :- not(Y=a). 
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In the first clause X is assumed to be of binding-type static (or at least nonvar) so 
the negation can be reduced. 9 In the second we assume that Y is dynamic. If we 
run the generating extension with goal p(a) we will get an empty program, which is 
correct. If we run the generating extension with goal q(Y) we will get the following 
(correct) residual clause: 

q__0(B) :- not(B=a) . 

Handling conditionals is also straightforward. If the test goal of a conditional is 
reducible then we can evaluate the conditional within the generating extension. If 
the test goal of the conditional is non-reducible then, similarly to the negation, we 
require that the three subgoals in the generating extension do not fail nor propagate 
bindings: 

Mi : Ki ~» 7i : a; t 

(C fresh variable) 

(k 1z >k 2 ]_k 3 ) ~> (7i->(72,cr2=C);(73,cr 3 =C)) : C 

Mi : Ki ~> 7i : a t A \=hide 1% 
(ki->k 2 ;k 3 ) ~> 71,72,73 : {<J\-><Ji \ ez) 

4-7 Disjunctions 

To handle disjunctions we will use our hide_nf annotation to ensure that failure of 
one disjunct does not cause the whole specialisation to fail. It will also ensure that 
the bindings from one disjunct do not propagate over to other disjuncts. The rule 
for disjunctions therefore has the form: 

Mi: K, ~» 7» : o-j \= hlde 7, 

(ki;...;k„)~» (71,..., 7„) : (ai; . . . ; a n ) 

The above rule will result in a disjunction being created in the residual code. We 
could say that the disjunctions are residualised. It is possible to treat disjunction in 
a different way in which they are reduced away, but at the price of some duplication 
of work and residual code. The rule for such reducible disjunctions is: 

Mi: Ki ~* 7 l : a t 

(c fresh variable) 

(ki;. ■ . ;k„) ~» (7i,cti=C; . . .;j n ,a n =C) : C 

The drawback of this rule is that it may duplicate work and code. To see this 
consider a goal of the form: Qh, (Qi; Q2), Qt ■ If specialisation of Q\ \ Q2 does not 
give any instantiation of the variables that occur in Qh and Q t then these will be 
specialised twice and identical residual code will be generated each time. 

9 Note that it is up to the binding-type analysis to mark negations as reducible only if this is 
sound, e.g., when the arguments are ground. 
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4-8 More refined treatment of the call predicate 

In this section we present one example of specialisation using the call predicate 
and show how its specialisation can be further improved. The call predicate can 
be considered to be declarative 10 and is important for implementing higher-order 
primitives in Prolog. Unfortunately, current implementations of call are not very 
efficient and it would therefore be ideal if the overhead could be removed by spe- 
cialisation. This is exactly what we are going to do in this section. 

In call(C) the value of C can either be a call to a built-in or a user-defined 
predicate. Unless the predicate is externally defined the two cases require different 
treatment. Consider the following example, featuring the Prolog implementation of 
the higher-order map predicate: 

map(P, [],[]). 

map(P, [HIT] , [PHIPT]) :- Call =. . [P.H.PH] , call (Call), map(P,T,PT) . 
inc(X.Y) :- Y is X + 1. 

Assume that we want to specialise the call map(inc,I,0). We can produce the 
BTC (A, A) with A = {map(static, dynamic, dynamic), ±nc(dynamic , dynamic)} and 
where A marks everything, but the = . ./2 call in clause 2, as non-reducible. Indeed, 
since the value of Call is not known when we generate the unfolding predicate for 
map we should in general not try to unfold the atom bound to Call. The unfolding 
predicate generated by the cogen thus looks like: 

map_u(B, [] , [] .true) . 

map_u(C, [DIE] , [F|G] , (call(H) ,1)) :- H =. . [C.D.F] , map_m(C,E,G,I) . 

The specialised code obtained for the call map(inc,l,o) is: 
map__0 ([],[]). 

map__0([B|C] , [DIE]) :- inc(B,D), map__0(C,E) . 

All the overhead of call and = . . has been specialised away, but one still needs 
the original program to evaluate inc. To overcome this limitation, one can devise a 
special treatment for calls to user-defined predicates which enables unfolding within 
a call/1 primitive: 

call (A) ~> add_extra_argument(" u " , A, C, G), call(G) : C (C fresh variable) 
call(A) ~> add_extra_argument(" m " , A, C, G), call(G) : C (C fresh variable) 

In both cases the argument to call has to be a user-defined predicate which will 
be known by the generating extension but is not yet known at cogen time. If this is 
not the case one has to use the standard technique for built-ins and possibly keep 
the original program at hand. 

The code for add_extra_argument can be found in Appendix A. It is used to 
construct calls to the unfolder and memoisation predicates. For example, calling 
add_extra_argument("_u" ,p(a) ,C,Code) gives Code = p_u(a,C). 

Using this more refined treatment, the cogen will produce the following unfolder 
predicate: 



If delayed until its argument is nonvar, it can be viewed as being denned by a series of facts: 
call(p(X)) :- p(X) ., call(q(X,Y)) :- q(X,Y) . , . . . 
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map_u(B, [] , [] ,true) . 
map_u(C, [DIE] , [F|G] ,(H,I)) :- 

J=..[C,D,F], add_extra_argument(''_u'', J,H,K) , call(K), 

map_m(C,E,G,I) . 

The specialised code obtained for the call map(inc,I,0) is then: 
map__0([] , []) . 

map__0([B|C] , [DIE]) :- D is B + 1, map__0(C,E) . 

All the overhead of map has been removed and we have even achieved unfolding 
of inc. 

In the case we know the length of the list, we can even go further and remove 
the list processing overhead. In fact, we can now produce the BTC (A, A') with 
A' = {map(static, list (dynamic), dynamic), inc( dynamic, dynamic)}. If we then 
specialise map(inc, [X, Y, Z], O) we obtain the following: 

map (B , C , D , [E , F , G] ) : - E is B + 1 , F is C + 1 , G is D + 1 . 

5 Experimental Results 

In this section we present a scries of detailed experiments with our LOGEN system 
as well as with some other specialisation systems. 

A first experimental evaluation of the cogen approach for Prolog was performed 
in (J0rgensen and Leuschel 1996). However, due to the limitations of the initial 
cogen only very few realistic examples could be analysed. Indeed, most interesting 
partial deduction examples require the treatment of partially instantiated data, and 
the initial cogen was thus not very useful in practice. The improved cogen of this 
paper can now deal with such examples and we were able to run our system on a 
large selection of benchmarks from (Leuschel 1996-2000). We only excluded those 
benchmarks in (Leuschel 1996-2000) which are specifically tailored towards testing 
tupling or deforestation capabilities (such as applast, doubleapp, flip, maxlcngth, 
remove, rotatc-prune, upto-sum, ...), as neither LOGEN nor LIX (nor MIXTUS) will 
be able to achieve any interesting specialisation on them. 

To test the ability to specialise non-declarative built-ins we also devised one 
new non-declarative benchmark: specialising the non-ground unification algorithm 
with occurs-check from page 152 of (Sterling and Shapiro 1986) for the query 
unify (f (g(a) ,a,g(a)) ,S). More detailed descriptions about all the benchmarks can 
be found in (Leuschel 1996-2000). 

Our new LOGEN system runs under Sicstus Prolog and is publicly available at 
http://www.ecs.soton.ac.uk/~mal (along with the lix system). We compare the 
results of LOGEN with the latest versions of MIXTUS (Sahlin 1993) (version 0.3.6) and 
ECCE (Leuschel et al. 1998, De Schreye et al. 1999). (Comparisons of the initial cogen 
with other systems such as logimix, paddy, and SP can be found in (J0rgensen and 
Leuschel 1996)). For evaluation purposes, we will also compare with our traditional 
offline specialiser lix, which performs exactly the same specialisation as LOGEN (and 
works on exactly the same annotations) . As we have the LOGEN at our disposal, we 
have not tried to make lix self-applicable, although we conjecture that, using our 
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extensions developed in Section 4, it should be feasible to do so (especially since 
lix was derived from logen). 

All the benchmarks were run under SICStus Prolog 3.7.1 on a Sun Ultra E450 
server with 256Mb RAM operating under SunOS 5.6. 
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3860 ms 
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11520 ms 
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9120 ms 
156 



45 ms 
0.77 



58 ms 
1 



82 ms 
1.40 



Table 1. Specialisation Times 



Specialisation Times 

A summary of all the transformation times can be found in Table 1. The times for 
MIXTUS contains the time to write the specialised program to file (as we are not the 
implcmcntors of mixtus we were unable to factor this part out), as does the column 
marked "with" for ECCE. The column marked "w/o" is the pure transformation time 
of ECCE without measuring the time needed for writing to file. The times for logen 
exclude writing to file. Note that ECCE can only handle declarative programs, and 
could therefore not be applied on the ng_unify benchmark. For logen, the column 
marked by cogen contains the runtimes of the cogen to produce the generating 
extension, whereas the column marked by genex contains the times needed by the 
generating extensions to produce the specialised programs. To be fair, it has to be 
emphasised that the binding-type analysis for logen and lix was carried out by 
hand. In a fully automatic system thus, the column with the cogen runtimes will 
have to be increased by the time needed for the binding-type analysis. The same 
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Benchmark j Original mixtus ecce logen / lix 



advisor 


1 


3.94 


3.29 


3.94 


contains. kmp 


1 


5.17 


6.2 


4.89 


ex_depth 


1 


2.16 


2.72 


2.77 


grammar 


1 


14.40 


9.60 


15.16 


groundunify.simple 


1 


14.00 


14.00 


1.56 


groundunify.complex 


1 


14.33 


14.33 


14.33 


imperative-solve 


1 


1.35 


2.56 


1.35 


map. rev 


1 


2.30 


1.53 


1.92 


map. reduce 


1 


3.00 


3.60 


3.18 


match. kmp 


1 


1.46 


1.93 


1.15 


modeLelim 


1 


3.56 


3.78 


2.69 


regexp.rl 


1 


6.23 


4.26 


6.35 


regexp.r2 


1 


2.50 


2.57 


3.00 


regexp.r3 


1 


3.36 


3.14 


1.15 


ssuply 


1 


51.00 


51.00 


51.00 


transpose 


1 


22.71 


22.71 


22.71 


ctl 


1 


5.85 


5.64 


5.85 


ng.unify 


1 


4.44 




3.72 


Average Speedup 


1 


9.25 


8.99 


8.41 


Total Speedup 


1 


3.63 


3.89 


2.83 



Table 2. Speedups of the specialised programs 



is true for the lix column. In general, the binding-type analysis will be the most 
expensive operation in one-shot applications, and we will address this issue in more 
detail in the next section. However, the binding-type analysis and the cogen have 
to be run only once for every program and division. For example, the generating 
extension produced for regexp.rl was re- used without modification for regexp.r2 and 
regexp.ri while the one produced for map. rev was re-used for map. reduce. Another 
example is the ctl interpreter for computation tree logic which is specialised over 
and over again for different systems and different CTL temporal logic formulas, 
e.g., in (Leuschel and Lehmann 2000). Hence, in a context where the same program 
is specialised over and over again for different static values, the time devoted to the 
BTA will usually become negligible. 

In summary, the results in this section are valid in a setting where a knowledgeable 
user can produce a good and safe BTC by hand (we have developed a Tcl/Tk 
based graphical front end that helps the user by providing visual feedback about 
the annotations) and the same program is re-specialised multiple times. 

As can be seen in Table 1, LOGEN and LIX are the fastest specialisation systems 
overall, running up to almost 3 orders of magnitude faster than the existing online 
systems, lix runs roughly 40 % slower than the generating extensions of logen. 
Note that for 3 benchmarks {contains .kmp , regexp.r2/3) the cost of running the 
cogen is already re-covered after a single specialisation. All in all, specialisation 
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times of both LOGEN and LIX are very satisfactory and seem to be more predictable 
than that of online systems. 

Quality of the Specialised Code 

Table 2 contains the speedups obtained by the various systems. The table also con- 
tains the overall average speedup and total speedup. The latter is a fairer measure 
than average speedup and is obtained by the formula > Vll " VCH where n is the 

number of benchmarks and spec i and origi are the absolute execution times of the 
specialised and original programs respectively. 

As can be seen in Table 2, the specialisation performed by the LOGEN system 
is not very far off the one obtained by MIXTUS and ECCE; sometimes LOGEN even 
surpasses both of them (for ex_depth, grammar, regexp.rl and regexp.r2). Being a 
pure offline system, LOGEN cannot pass the KMP-test, which can be seen in the 
timings for match, mathitkmp in Table 2. (To be able to pass the KMP-test, more 
sophisticated local control would be required, see (Martin and Leuschel 1999) and 
the discussion below.) 

Again, to be fair, both ECCE and mixtus are fully automatic systems guaran- 
teeing termination, while for LOGEN sufficient specialisation and termination had 
to be manually ensured by the user via the BTC . We return to this issue below. 
Nonetheless, the LOGEN system is surprisingly fast and produces surprisingly good 
specialised programs. 

Finally, the figures of LOGEN in Tables 1 and 2 shine when compared to the 
self- applicable SAGE system, where compiler generation usually takes more than 
10 hours (with garbage collection) (Gurr 1994) and where the resulting generating 
extension are still pretty slow (Gurr 1994) (taking more than 100000ms to produce 
the specialised program; unfortunately self-applying SAGE is not possible for normal 
users and we cannot make exact comparisons with logen). 

6 Automating Binding-time Analysis 

Automating the process of binding-time analysis has received a lot of attention 
in the context of functional and imperative languages (Bondorf and J0rgensen 
1993, Consel 1993). In the context of logic programs, a major step in achieving 
automatic binding-time analysis has recently been the use of termination analy- 
sis (Bruynooghe, Leuschel and Sagonas 1998, Vanhoof and Bruynooghc 2001). In 
what follows, we highlight the main aspects of (Vanhoof and Bruynooghc 2001) and 
report on some experiments. 

6.1 Automatic Binding-time Analysis 

When annotating a program, one generally wants to mark as many atoms reducible 
as possible, while guaranteeing termination of the unfolding. In order to study the 
termination characteristics of an unfolding rule Ua associated to an annotation A, 
we adopt a slightly different notion of annotation from (Vanhoof and Bruynooghc 
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2001). The basic idea is to represent the annotation A on a clause by a new clause 
(which we will call a t- annotation) in which the non-reducible atoms are replaced 
by true. This will allow to mimic unfolding using Ua by normal evaluation of the 
corresponding t-annotation. 

Definition 19 

Given a clause H <— B\, . . . ,B n , a t-annotated version of the clause is a clause 
H <— B[, . . . ,B' n , where for each i such that 1 < i < n, it holds that either 
B[ = Bi or B[ — true. A t-annotated version of a program P = (J i d is a program 
P' = |J i C\ such that for every such clause Ci , it holds that C[ is a t-annotated 
version of Ci. 

Note that, according to Definition 19, every clause is a t-annotated version of itself. 
Given an annotation A for a program P, we will denote with Pa the t-annotated 
version of P obtained by replacing the atoms that are marked non-reducible by 
A with true. Note that there is a one-to-one correspondence between A, Pa and 
Pa and in what follows we will freely switch between them, referring simply to an 
"annotated" program. The introduction of a t-annotation allows to reason about the 
termination behaviour of an unfolding rule Ua when unfolding PU{G} by studying 
the termination behaviour of Pa with respect to G. Indeed, if Pa terminates for a 
goal G, then the (possibly incomplete) SLD-tree for P U {G} built by Ua is finite 
and vice versa. 

The above observation is the core of the algorithm developed by (Vanhoof and 
Bruynooghe 2001), which computes a terminating t-annotation of a program P for 
a goal G. The basic intuition behind the algorithm, which is depicted in Fig. 10, is 
as follows: suppose we have to annotate a program P with respect to an initial goal 
G. If we can prove that G terminates with respect to P, the t-annotated version of 
P returned by the algorithm is simply P itself (corresponding with a Pa in which 
every atom is annotated reducible). Hence, Ua constructs a complete SLD-tree for 
P U {G} and specialisation of G boils down to plain evaluation. If, on the other 
hand, termination of G with respect to the t-annotation under construction can not 
be proven by the analysis due to the presence of a possible loop, the algorithm tries 
to remove the loop by replacing an atom by true. This process is repeated until the 
constructed t-annotation, and hence the annotated program, is proven to be loop 
free. 

To characterise the possible loops in a program (or a t-annotation) P, the anal- 
ysis first identifies which of the atoms are loop-safe. Intuitively, an atom Bi in a 
clause H <— B\ , . . . , B n G P is said to be loop safe if the analysis can prove that 
a finite SLD-tree is built for any atom from the program's callset (the set of calls 
that can possibly arise during evaluation of P U {G}) that unifies with H if the 
tree is constructed by unfolding only the i leftmost body atoms of the clause under 
consideration. Computing whether an atom is loop safe is achieved by known tech- 
niques of termination analysis. In our work, we followed the approach of (Codish 
and Taboch 1999). A norm ||.|| is chosen - mapping a term to a natural number - 
and the program's callset is approximated by a finite abstract callset, denoted by 
callsp(G). Every call in calls P (G) is of the form p(b\, . .. ,b n ) with bi a boolean 
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stating whether or not the size of that argument (according to the chosen norm) can 
change upon further instantiation. More formally, we can define the generalisation 
of a call p(ti, . . . ,t n ) as p{a\\^(t\), . . . , an .11 (£«)), where an 11 is defined as follows, 
mapping terms onto the boolean domain {false, true} with false > true: 



<x\\.\\(t) 



true if \\t9\\ = \\t\\ for any 9 
false otherwise 



The abstract callset is kept monovariant - containing a single call per predicate - 
by taking the predicate-wise least upper bound of the calls in the set. The system 
then concludes loop-safeness of an atom Bi in a clause H <— B\ , . . . , B n if it can 
show that there is a guaranteed decrease in size between H and any recursive call 
that may occur during unfolding of B\, . . . , Bi given the calls in calls P {G) and the 
size relations between the sizes of the arguments in B\, . . . , i?f_i. 

Given the atoms that are guaranteed to be loop safe, the algorithm identifies in 
each of the clauses the leftmost atom - if it exists - which is not proven to be loop 
safe, and removes one of these. Note that the algorithm is non deterministic, as 

Given a program P and initial goal G. 
Let P = P, S = callsf>(G), k = 0. 
repeat 

if there exist a clause i in Pk such that the j'th body atom 

cannot be proven to be loop-safe given Sk 

then 

let Pk+i be the program obtained by replacing the j'th 
body atom in the i'th clause in Pk by true and 
let S k +i = S k Ucalls a Pk+1 {G) 
else 

Pk+i = Pk 
k = k + l 
until P k = Pfc-i 
P' = Pk, S' = Sk 

Fig. 10. The binding-time analysis algorithm. 



several such clauses may exist. Also note the construction of the set 5": starting 
from the program's initial abstract callset Sq, in each round the predicate- wise least 
upper bound is computed with the current t-annotation's abstract callset. Doing so 
guarantees that the calls that are unfolded are correctly represented by an abstract 
call in S', but it also ensures that S' contains abstractions of the (concrete instances 
of the) calls that were replaced by true during the process. In other words, the set S' 
contains an abstraction of every call that is encountered (unfolded or residualised) 
during specialisation of P with respect to the initial goal G. Termination of the 
algorithm is straightforward, since in every iteration an atom in a clause is replaced 
by true, and the program only has a finite number of atoms. 

Example 9 

Consider the meta interpreter depicted in Fig. 11. The interpreter has the member/2 
and append/3 predicates as object program. 
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1: solve ([]) . 

2: solve( [A I Gs] ) :- solve_atom(A) , solve (Gs) . 

3: solve_atom(A) : -clause (A, Body) , solve (Body) . 

4:clause(member(X,Xs) , [append(_, [X|_] ,_Xs)]) . 
5 : clause (append( [] ,L,L) , [] ) . 

6:clause(append([X|Xs] ,Y, [Z I Zs] ) , [append (Xs , Y,Zs) ] ) . 
Fig. 11. Vanilla meta interpreter 

The binding-time analysis inherits from its underlying termination analysis (Codish 
and Taboch 1999) the need for a norm to be selected by the user. An often used 
norm on values of the type list(T) is the so-called listlength norm, counting the 
number of elements in a list. It is defined as follows: 

II Nil = o 

\\[.\Xs]\\ = i + 

Running the binding-time analysis of (Vanhoof and Bruynooghc 2001) on the pro- 
gram depicted in Example 9 with respect to the listlength norm and the initial 
goal solve ( [mem (X,Xs)] ) results in an annotated program in which the call to 
solve_atom/l is annotated non-reducible and every other call as reducible. The 
resulting abstract callset is 

{solve(true),solve-atom(false), clause(false, false)} 

denoting that every call to solve/1 has an argument that is at least bound to a 
list skeleton, whereas the arguments in calls to solve_atom/l and clause/2 may 
be of any instantiation. 

Note that there is a close correspondence between the abstract callset and a 
(monovariant) division. If wc define the concretisation function 7||.|| mapping a 
boolean to a type as 7||.|| (b) = r where r is the most general type such that for all 
terms t : r holds that . || C*) — then we can define the division corresponding to 
an abstract callset S as 

A = {p(7||.||(6 1 ),...7||.||(6 n ))|p(6i,... ) & n )GS}- 

If Ua and A represent, respectively, the unfolding rule and the division correspond- 
ing with the t-annotation and abstract callset computed by the binding-time algo- 
rithm, then (Ua,A) is a globally safe binding-time classification for the program 
under consideration. 

Example 10 

The division corresponding with the callset above is 

A = {solve(list (dynamic)), so/ue_afom(dynamic), dause(dynamic)} 
where the parametric type list( . ) is defined as before: 



- type list(T) — > [ ] ; [T I list(T)]. 
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6.2 Additional Experiments 

Table 3 summarises a number of experiments that were run with the binding-time 
analysis of (Vanhoof and Bruynooghe 2001). We could not use all of the benchmarks 
from Section 5, because the current BTA is not yet capable of treating some of the 
built-ins required and, while it can handle partially static data, it can only handle 
one kind of partially static data (depending on the single norm with respect to 
which the program is analysed). 

The timings in Table 3 arc in milliseconds and were measured on the same ma- 
chine and Prolog system used in Section 5. The second column (Roundl) presents 
the timings for termination analysis of the original program (in which all calls 
are annotated reducible). In case the outcome of the analysis is possible non- 
termination, the third column presents the timings for termination analysis of the 
program from which a call was removed. None of the benchmarks required more 
than two rounds of the algorithm to derive a terminating t-annotation. The fourth 
column then contains the total time needed to produce the generating extension us- 
ing logen and to run it on the partial deduction query. The final column contains 
the specialisation time of MIXTUS (from Section 5) as a reference point. 



Benchmark | Round 1 Round 2 | LOGEN | Total | 



MIXTUS 



ex_depth 
match. kmp 
map. rev/reduce 
regexp.rl-3 
transpose 



240.0 ms 
470.0 ms 
200.0 ms 
740.0 ms 
210.0 ms 



230.0 ms 
180.0 ms 

280.0 ms 
150.0 ms 



4.4 ms 
2.4 ms 
4.3 ms 
15.1 ms 
7.0 ms 



474 ms 
652 ms 
204 ms 
1035 ms 
367 ms 



200 ms 
50 ms 
100 ms 
670 ms 
290 ms 



Total j 2850 ms | 34.5 ms | 2885 ms || 1330 ms 

Table 3. Timings for the binding-time analysis and full specialisation. 



Note that we slightly modified the transpose benchmark in the sense that the 
first argument is fully static. In fact, in the original transpose benchmark the first 
argument is a list skeleton whose first element in turn is a list skeleton but whose 
other elements are dynamic. This binding-type cannot be represented precisely by 
a semi-linear norm (which is required by the termination analysis of (Codish and 
Taboch 1999) underlying the binding-time analysis). 

Analysing Table 3 we can see that the binding-time analysis is indeed the most 
expensive operation in a one-shot situation. However, the timings are not too bad 
compared to MIXTUS and the cost of the binding-time analysis will already be re- 
covered after a few specialisations (e.g., after 3 iterations for ex_depth and after 
2 iterations for regexp.rS). Table 4 contains a summary of the speedups obtained 
by the logen (or lix) system when using the annotations obtained by the above 
binding-time analysis. For comparison's sake we have also added the corresponding 
speedups using the methods of Section 5. Observe that, as was probably to be ex- 
pected, the automatically generated annotations lead to less speedups than using 
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Benchmark | Original MIXTUS ECCE logen hand | LOGEN automatic 



ex_depth 
map. rev 



1 
1 
1 
1 
1 
1 
1 
1 



2.16 
2.30 
3.00 
1.46 
6.23 
2.50 
3.36 
22.71 



2.72 
1.53 
3.60 
1.93 
4.26 
2.57 
3.14 
22.71 



2.77 
1.92 
3.18 
1.15 
6.35 
3.00 
1.15 
22.71 



2.23 
1.53 
1.29 
1.34 
6.35 
3.00 
1.15 
5.89 



map. reduce 
match.kmp 



regexp.rl 
regexp.r2 
regexp.r3 
transpose 



Average Speedup 
Total Speedup 



1 
1 



5.47 
2.84 



5.31 
2.85 



5.28 
2.31 



2.85 
1.93 



Table 4. Speedups of the specialised programs 



hand-crafted annotations. Indeed, the hand-crafted annotations for ex_depth uses 
the hide_nf annotation to prevent duplication of expensive calls as described in Sec- 
tion 4.4, the hand-crafted annotations for map. rev and map. reduce uses the special 
annotations for the call primitive described in Section 4.8, while for transpose the 
termination analysis of the automatic BTA classified one call as non-terminating 
which is in fact terminating. Nonetheless, the figures are still pretty good, for the 
3 regexp benchmarks we obtain exactly the same result as the hand-crafted anno- 
tation and for the match.kmp the automatic annotation actually outperforms the 
hand-crafted one. 

The conducted experiments show that the approach is feasible and can be auto- 
mated. However, some issues regarding the current binding-time analysis remain. 
The analysis basically deals with boolean binding-times: either a value is instanti- 
ated enough with respect to a norm, or it is not. Recent research (Genaim, Codish, 
Gallagher and Lagoon 2002, Vanhoof and Bruynooghe 2002) shows that termi- 
nation proofs can be constructed by measuring the size of a term by means of a 
number of simple norms rather than using a single sophisticated norm. These simple 
norms basically count the number of subterms of the term that are of a particular 
type. In the presence of type information these norms can be constructed automat- 
ically. When combined with information that denotes whether further instantiating 
a term can introduce more subterms of the particular type they provide a more 
fine-grained characterisation of a term's size and instantiation. We conjecture such 
a more detailed characterisation to be a powerful and promising mechanism to 
derive an automatic binding-time analysis capable of constructing more precise 
binding-types. Also note that the current analysis only produces monovariant divi- 
sions. Polyvariance of the analysis can in principle be obtained by allowing several 
calls to the same predicate in the abstract callset, creating a new variant of the 
predicate definition for each abstract call and checking termination of each such 
predicate separately. 

In summary, at least for the experiments in Tables 3 and 4, we can conclude 
that online systems are to be preferred - both in terms of speed and quality of 
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the specialised code - in one-shot situations where no expert user is available to 
perform the annotation. Nonetheless, the quality of the fully automatic logen 
is satisfactory and the cost of the binding-time analysis will usually be recovered 
already after a few specialisations. This means that the fully automatic logen 
might be useful in situations were the same program is specialised multiple times 
and the specialisation times itself are of utmost importance. Further work is needed 
to extend and refine the binding-time analysis and to establish its scaling properties 
for larger programs. 

7 Discussion and Future Work 
7.1 Related Work 

The first hand-written compiler generator based on partial evaluation principles 
was, in all probability, the system RedCompile (Beckman, Haraldson, Oskarsson 
and Sandewall 1976) for a dialect of Lisp. Since then successful compiler genera- 
tors have been written for many different languages and language paradigms (Ro- 
manenko 1988, Hoist 1989, Hoist and Launchbury 1991, Birkedal and Welindcr 
1994, Andersen 1994, Gliick and J0rgensen 1995, Thiemann 1996). 

In the context of definite clause grammars and parsers based on them, the idea 
of hand writing the compiler generator has also been used in (Neumann 1990, 
Neumann 1991). However, it is not based on (off-line) partial deduction. 

Also the construction of our program (Definition 16) is related to the idea of 
abstract compilation (Hermenegildo, Warren and Debray 1992, Codish and Demoen 

1995) . In abstract compilation a program P is first transformed and abstracted. 
Evaluation of this transformed program corresponds to the actual abstract inter- 
pretation analysis of P. In our case concrete execution of P^ performs (part of) the 
partial deduction process. Another similar idea has also been used in (Tarau and 
De Bosschere 1994) to calculate abstract answers. Finally, (Gallagher and Lafave 

1996) uses a source-to-source transformation similar to ours to compute trace terms 
for the global control of logic and functional program specialisation (however, the 
specialisation technique itself is still basically online). 

The local control component of our generating extensions is still rather limited: 
either a call is always reducible or never reducible. To remedy this problem, and to 
allow any kind of partially instantiated data, an extension of our cogen approach 
has been developed in (Martin and Leuschel 1999). This approach uses a sounding 
analysis (at specialisation time) to measure the minimum depth of partially instan- 
tiated terms. The result of this analysis is then used to control the unfolding and 
ensure termination. This approach allows more aggressive unfolding than the tech- 
nique presented in this paper, passing the KMP-test and rivalling online systems in 
terms of flexibility. Due to the sounding analysis, however, it is not fully offline. In 
terms of speed of the specialisation process, it is hence slower than our fully offline 
cogen approach (but still much faster than online systems such as MIXTUS or ecce). 
Also, (Martin and Leuschel 1999) only addresses the local control component and 
it is still unclear how it can be extended for the global control (the prototype in 
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(Martin and Leuschel 1999) uses the online ECCE system for global control; to this 
end trace terms were built up in the generating extension like in (Gallagher and 
Lafave 1996)). 

Although our approach is closely related to the one for functional programming 
languages there are still some important differences. Since computation in our cogen 
is based on unification, a variable is not forced to have a fixed binding time assigned 
to it. In fact the binding-time analysis is only required to be safe, and this does not 
enforce this restriction. Consider, for example, the following program: 

g(X) :- p(X),q(X) 
p(a). q(a) . 

If the initial division A states that the argument to g is dynamic, then A is safe 
for the program and the unfolding rule that unfolds predicates p and q. The residual 
program that one gets by running the generating extensions is: 

g__0(a). 

In contrast to this any cogen for a functional language known to us will classify 
the variable X in the following analogous functional program (here exemplified in 
Scheme) as dynamic: 

(define (g X) (and (equal? X a) (equal? X a))) 

and the residual program would be identical to the original program. 

One could say that our system allows divisions that are not uniformly congruent 
in the sense of Launchbury (Launchbury 1991) and essentially, our system performs 
specialisation that a partial evaluation system for a functional language would need 
some form of driving (Gliick and S0renscn 1994) to be able to do. However, our 
divisions are still congruent: the value of a static variable cannot depend on a 
dynamic value. In the above example, the value of X within the call q(X), if reached, 
is always going to be a, no matter what the argument to g is. 

7.2 Mixline Specialisation 

Some built-ins can be treated in a more refined fashion than described in Section 4. 
For instance, for a call var(X) which is non-reducible we could still check whether 
the call fails or succeeds in the generating extension. If the call fails, we know that 
it will definitely fail at runtime as well. In that case we don't have to generate code 
and we thus achieve improved specialisation over a purely offline approach. If the 
call var(X) succeeds, however, we have gained nothing and still have to perform 
var(X) at runtime. 

Similarly, for a call such as ground(X), if it succeeds in the generating extension 
we can simply generate true in the specialised program. In that case we have again 
improved the efficiency of the specialised program. If, on the other hand, ground (X) 
fails in the generating extension it might still succeed at runtime: we have to gen- 
erate the code ground (X) and have gained nothing. 

The rules below cater for such a mixline (Jones et al. 1993) treatment of some 
built-ins. 
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c c : C if c = var(i), copy_term(s, i), s\==t, . . . 

C ~> (c — > C = frwe; C = c) : C if c = ground(t), nonvar(i), 

atom(t), integer (t), s=t, . . . 

The code of the cogen in Appendix A uses these optimisations if a mixcall anno- 
tation is used (these annotations have not been used for the experimental results 
in Section 5). It also contains a mixline conditional, which reduces the conditional 
to the then branch (respectively else branch) if the test definitely succeeds (respec- 
tively definitely fails) in the generating extension. 

Similarly, one can also produce a new binding-type, called mix, which lies in 
between static and dynamic (Jones et al. 1993). Basically, mix behaves like static 
for the generalisation gen A (Definition 14) but like dynamic for filtering filter A 
(Definition 15). The former means that an argument marked as mix will not be 
abstracted away by gen A , while the latter allows such an argument to contain 
variables. Again, the code for these improvements can be found in Appendix A. 

Another worthwhile improvement is to enable mixline unfolding of predicates. 
In other words, instead of either always or never unfolding a predicate, one would 
like to either unfold the predicate or not based upon some (simple) criterion. This 
improvement can be achieved, without having to change the cogen itself, by modi- 
fying the annotation process. Indeed, instead of marking a call p(ti, . . . ,t n ) cither 
as reducible or non-reducible we simply insert a static conditional into the an- 
notated program: (Test -> p(ti,...,t n ) ; p(ti, ... ,*„))■ Thus, if Test succeeds the 
generating extension will unfold the call, otherwise it will be memoised. 

We have actually used these improvements to produce a mixline annotation of 
the match. kmp benchmark from Section 5. The results of this experiment (after 
some very simple post-processing) is as follows. 

Program | cogen genex spec, runtime speedup 

match. kmp | 1.2 ms 3.7 ms 2480 ms 1.51 x 

Note that logen now outperforms mixtus, passes the KMP-test (actually, even 
without the post-processing; see (S0rensen and Ghick 1999)). 

7.3 More Future Work 

In addition to extending our BTA to generate hide_nf annotations and to fully 
integrate the BTA into the LOGEN system, one might also think of further extending 
its capabilities and domain of application. 

First, one could try to extend the cogen approach so that it can achieve multi- 
level specialisation a la (Gliick and J0rgensen 1995). One could also try to use the 
cogen for run time code generation. A first version of the latter has in fact already 
been implemented; this actually does not require all that many modifications to 
our cogen. The former also seems to be reasonably straightforward to achieve. 

Another interesting recent development is fragmental specialisation (Helsen and 
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Thiemann 2000), where the idea is to specialise fragments of the code (such as 
modules) in the order in which they arrive. It should be possible to add such a 
capability to our cogen, by using co-routining features (e.g., of SICStus Prolog) so 
as to suspend, for predicates p defined in other fragments, calls to the corresponding 
Pu or p m predicates until the fragment defining p is available. 

One might also investigate whether the cogen approach can be ported to other 
logical programming languages. It seems essential that such languages have some 
metalevel built-in predicates, like Prolog's findall and call predicates, for the 
method to be efficient. Further work is needed to establish whether it is possible to 
adapt the cogen approach for Godel (Hill and Lloyd 1994) or Mercury (Somogyi et 
al. 1996) so that it still produces efficient generating extensions. 

Finally, it also seems natural to investigate to what extent more powerful control 
techniques (such as characteristic trees (Gallagher and Bruynooghe 1991, Leuschel 
et al. 1998), trace terms (Gallagher and Lafavc 1996) or the local control of (Martin 
and Leuschel 1999)) and specialisation techniques (like conjunctive partial deduc- 
tion (Leuschel et al. 1996, Gliick, J0rgensen, Martens and S0rensen 1996, De Schreye 
et al. 1999)) can be incorporated into the cogen, while keeping its advantages in 
terms of efficiency. 

7. 4 Conclusion 

In the present paper we have formalised the concept of a binding-type analysis, 
allowing the treatment of partially static structures, in a (pure) logic programming 
setting and how to obtain a generic procedure for offline partial deduction from such 
an analysis. We have then developed the cogen approach for offline specialisation, 
reaping the benefits of self-application without having to write a self-applicable 
specialiser. The resulting system, called logen, is surprisingly compact and can 
handle partially static data structures, declarative and non-declarative built-ins, 
disjunctions, conditionals, and negation. We have shown that the resulting system 
achieves fast specialisation in situations where the same program is re-specialised 
multiple times. We have also overcome several limitations of earlier offline systems 
and shown that logen can be applied on a wide range of natural logic programs and 
that the resulting specialisation is also very good, sometimes even surpassing that of 
existing online systems. We have also developed the foundation for a fully automatic 
binding-type analysis for the logen system, and have evaluated its performance 
on several examples. 
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A The Prolog cogen 

This appendix contains the listing of the cogen. It works on an annotated version of 
the program to be specialised which contains definitions for the following predicates: 

• residual: defines the predicates by which the generating extension is to be 
called, as well as the predicates which are residualiscd. 

• filter: the division for the residual predicates 

• ann_clause: the annotated clauses where calls in the body are annotated by: 

— unfold for reducible user-defined predicates, and memo for non-reducible 
user-defined predicates, 

— call for reducible primitives (i.e., built-ins or open predicates; c.f., Sec- 
tion 4), and rescall for non-reducible user-defined predicates, 

— semicall for non-reducible primitives to be specialised in a mixline fashion 
(c.f., Section 7.2), 

— ucall for a call primitive calling a reducible user-defined predicate and 
mcall for a call primitive calling a non-reducible user-defined predicate 
(c.f., Section 4.8), 

— if and resif for reducible and non-reducible conditionals respectively, 
and semif for conditionals to be specialised in a mixline fashion (c.f., 
Section 7.2), 

— not and re snot for reducible and non-reducible negations respectively, 

— ; and resdisj for reducible and non-reducible disjunctions respectively, 

— hide, hide_nf to prevent the propagation of bindings and failure. 

An example annotated file can be found in Appendix B. 
/* */ 

/* COGEN */ 
/* */ 

: - ensure_consulted ( ' pp ' ) . 
cogen : - 

f indall(C,memo_clause(C) .Clausesl) , 
f indall (C,unfold_clause(C) ,Clauses2) , 
pp(Clausesl) , 
pp(Clauses2) . 

memo_clause(clause(Head, (f ind_pattern(Call , V) -> 

true ; 

(insert_pattern(GCall,Hd) , 
f indall (NClause , 

(RCall, NClause = clause (Hd, Body) ) , 
NClauses) , 
pp(NClauses) , 

find_pattern(Call,V))) )) :- 
residual (Call) , cogen_can_generalise (Call) , generalise (Call, GCall) , 
add_extra_argument("_u" , GCall, Body, RCall) , 
add_extra_argument("_m" , Call, V, Head) . 

memo_clause (clause (Head, (f ind_pattern(Call , V) -> 

true ; 

(generalise (Call, GCall) , 
add_extra_argument ( " _u" , GCall , Body , RCall) , 
insert_pattern (GCall, Hd) , 
f indall (NClause, 
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(RCall, NClause = clause (Hd, Body) ) , 
NClauses) , 
pp(NClauses) , 
f ind_pattern (Call , V) ) 
) )) :- 

residual (Call) , not (cogen_can_generalise (Call) ) , 
add_extra_argument("_m" , Call, V, Head) . 

unf old_clause (clause (ResCall ,FlatResBody) ) : - 
ann_clause(_, Call, Body) , 

add_extra_argument("_u" , Call ,FlatVars , ResCall) , 

body(Body,ResBody,Vars) , f latten(ResBody ,FlatResBody) , f latten(Vars,FlatVars) . 

body((G,GS) ,GRes,VRes) :- 

body(G,Gl,V) , f ilter_cons (Gl ,GS1 ,GRes , true) , 
filter_cons(V,VS,VRes,true) , body (GS ,GS1 , VS) . 

body (unf old(Call) , ResCall, V) :- add_extra_argument("_u" , Call, V, ResCall) . 
body(memo(Call) ,AVCall,VFllteredCall) :- 

add_extra_argument ( "_m" , Call , VFilteredCall , AVCall) . 

body(true, true, true) . 

body (call (Call) , Call, true) . 

body(rescalKCall) , true, Call) . 

body (semlcall (Call) , GenexCall .ResCall) : - 

specialise_imperat ive (Call , GenexCall , ResCall) . 

body(if (G1,G2,G3) , /* Static if: */ 

( (RG1) -> (RG2, (V=VS2)) ; (RG3, (V=VS3) ) ) , V) :- 

body(Gl,RGl,_VSl) , body(G2,RG2,VS2) , body (G3.RG3, VS3) . 
body(resif (G1,G2,G3) , /* Dynamic if: */ 

(RG1 ,RG2 ,RG3) , /* RG1 ,RG2 ,RG3 shouldn't fail and be determinate */ 
((VS1) -> (VS2) ; (VS3) ) ) :- 

body(Gl,RGl,VSl) , body (G2 ,RG2 , VS2) , body(G3,RG3,VS3) . 
body(semif (G1,G2,G3) , /* Semi-online if: */ 
(RGl,flatten(VSl,FlatVSl) , 
((FlatVSl == true) 

-> (RG2,SpecCode = VS2) 
; ((FlatVSl == fail) 

-> (RG3,SpecCode = VS3) 

; (RG2.RG3, (SpecCode = ((FlatVSl) -> (VS2) ; (VS3) ) ) ) 

) 

)), SpecCode) :- 
/* RG1 ,RG2 ,RG3 shouldn't fail and be determinate */ 

body(Gl,RGl,VSl) , body (G2 ,RG2 , VS2) , body(G3,RG3,VS3) . 

body(resdisj(Gl,G2),(RGl,RG2),(VSl ; VS2) ) :- /* residual disjunction */ 

body(Gl,RGl,VSl) , body(G2,RG2,VS2) . 
body( (G1;G2), ((RG1,V=VS1) ; (RG2 ,V=VS2) ) , V) :- /* static disjunction */ 

body(Gl,RGl,VSl) , body(G2,RG2,VS2) . 

body(not(Gl) ,\+(RGl) ,true) :- body(Gl,RGl,_VSl) . 
body(resnot(Gl) ,RG1 , \+(VSl) ) :- body (Gl ,RG1 , VS1) . 

body(hide_nf (Gl) , GXCode .ResCode) :- 
(body(Gl,RGl,VSl)-> 

(flatten(RGl.FlatRGl) , flatten(VSl, FlatVSl) , 
GXCode = (varlist(Gl,VarsGl) , 

findalK (FlatVSl, VarsGl) .FlatRGl ,ForAlll) , 
make_disjunction(ForAlll , VarsGl ,ResCode) ) ) ; 
(GXCode = true, ResCode=f ail) ) . 
body(hide(Gl) , GXCode, ResCode) :- 
(body(Gl,RGl,VSl)-> 

(flatten(RGl, FlatRGl) , flatten(VSl .FlatVSl) , 
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GXCode = (varlist(Gl,VarsGl) , 

findall((FlatVSl,VarsGl) .FlatRGl .ForAlll) , 
ForAlll = t_l_], /* detect failure */ 
make_disjunction(ForAlll , VarsGl ,ResCode) ) ) ; 
(GXCode = true, ResCode=f ail) ) . 

/* some special annotations: */ 

body(ucalKCall) , (add_extra_argument("_u" ,Call,V,ResCall) , call (ResCall) ) , V) 
body (mcall (Call) , (add_extra_argument("_m" , Call, V, ResCall) , call (ResCall) ) , V) 

make_disj ( [] , f ail) . 
make_disj([H] ,H) :- !. 

make_disj([H|T] , (H ; DT) ) : - make_disj (T,DT) . 

make_disjunction( [] ,_,f ail) . 
make_disjunction( [(H,CRG)] ,RG,FlatCode) :- 

! ,simplify_equality(RG,CRG,EqCode) , f latten((EqCode,H) ,FlatCode) . 
make_disjunction( [(H,CRG) |T] ,RG, (FlatCode ; DisT)) :- 

simplify_equality(RG,CRG,EqCode) , make_disjunction(T,RG,DisT) , 

flatten ((EqCode.H) , FlatCode) . 

specialise_imperative (Call, Call, Call) :- varlike_imperative(Call) , ! . 
specialise_imperative (Call , (Call -> (Code=true) ; (Code=Call) ) , Code) :- 

groundlike_imperative(Call) , ! . 
specialise_imperative (X, true ,X) . 

varlike_imperative(var(_X)) . 
varlike_imperative (copy_term(_X, _Y) ) . 
varlike_imperative( (_X\==_Y) ) . 
groundlike_ imperative (ground (_X) ) . 
groundlike_imperative (nonvar (_X) ) . 
groundlike_imperative (_X==_Y) . 
groundlike_imperative (atom(_X) ) . 
groundlike_imperative (integer (_X) ) . 

generalise (Call, GCall) :- 

((filter(Call,ArgTypes) , Call =. . [FlFArgs], 

l_generalise (ArgTypes , FArgs , GArgs) ) 
-> (GCall =. . [F I GArgs]) 

; (printO*** WARNING: unable to generalise: '), print (Call) ,nl, 
GCall = Call) ) . 

cogen_can_generalise(Call) :- 
f ilter(Call, ArgTypes) , 

static_types (ArgTypes) . /* check whether we can filter at cogen time */ 

/* types which allow generalisation/filtering at cogen time */ 
static_types( [] ) . 

static_types( [static I T] ) :- static_types (T) . 
static_types( [dynamic I T] ) :- static_types (T) . 

general ise(static, Argument , Argument ) . 
generalise (dynamic , .Argument , _Fresh Variable) . 
generalise (free , .Argument , _Fresh Variable) . 
generalise (nonvar , Argument .GenArgument) : - 

nonvar (Argument) , Argument =. . [FlFArgs], 

make_fresh_variables (FArgs , GArgs) , GenArgument = . . [F I GArgs] . 
generalise ( (Typel ; _Type2) .Argument, GenArgument) :- 

generalise (Typel , Argument , GenArgument) . 
generalise( (_Typel ; Type2) .Argument .GenArgument) :- 

generalise (Type2 , Argument , GenArgument ) . 
generalise (type (F) , Argument , GenArgument ) :- 

typedef (F,TypeExpr) , generalise (TypeExpr, Argument .GenArgument) . 
generalise(struct(F,TArgs) , Argument , GenArgument ) :- 
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nonvar (Argument) , Argument =. . [F I FArgs] , 

l_generalise (TArgs ,FArgs , GArgs) , GenArgument = . . [F I GArgs] . 
generalise (mix, Argument, Argument) . /* treat as static for generalisation */ 

l_generalise ([],[],[]). 

l_generalise([Typel|TT] , [All AT] , [G1IGT]) :- 

generalise (Typel , Al ,G1) , l_generalise(TT,AT,GT) . 

make_fresh_variables( [],[]). 

make_fresh_variables([_|T] , [_|FT]) :- make_f resh_ variables (T, FT) . 

typedef (list (T) , (struct ( [] , [] ) ; struct (' . ' , [T, type (list (T) )])) ) . 

typedef (model_elim_literal , (struct (pos , [nonvar] ) ; struct (neg, [nonvar] ) ) ) . 

add_extra_argument (T, Call , V,ResCall) : - 

Call = . . [Pred I Args] , res_name (T , Pred , ResPred) , 
append(Args, [V] .NewArgs) .ResCall =. . [ResPred I NewArgs] . 

res_name(T, Pred, ResPred) :- 

name(PE_Sep,T) , string_concatenate (Pred, PE_Sep , ResPred) . 

filter_cons(H,T,HT,FVal) :- 

((nonvar(H) ,H = FVal) -> (HT = T) ; (HT = (H,T) ) ) . 



B The Parser Example 

The annotated program looks like: 

/* file: parser. ann */ 
static_consult ( [] ) . 
residual (nont (_,_,_)). 

f ilter(nont(X,T,R) , [static, dynamic, dynamic] ) . 

ann_clause(l,nont(X,T,R) , (unf old(t (a,T, V) ) , memo (nont (X,V,R) )) ) . 
ann_clause(2,nont(X,T,R) , (unf old(t (X,T,R) ) ) ) . 
ann_clause(3,t(X, [X|Es] ,Es) ,true) . 

This supplies cogen with all the necessary information about the parser program, 
this is, the code of the program (with annotations) and the result of the binding- 
time analysis. The predicate filter defines the division for the program and the 
predicate residual represents the set C in the following way. If residual {A) succeeds 
for a call A then the predicate symbol p of A is in Pred(P)\C and p is therefore one 
of the predicates for which a m-predicate is going to be generated. The annotations 
unfold and memo is used by cogen to determine whether or not to unfold a call. 

The generating extension produced by cogen for the annotation nont(s,d,d) is: 

/* file: parser. gx */ 

/* */ 

/* GENERATING EXTENSION */ 

/* */ 

:- logen_reconsult ( 'memo' ) • 
:- logen_reconsult('pp') . 
nont_m(B,C,D,E) :- 

(( f ind_pattern(nont(B,C,D) ,E) 
) -> ( true ) ; ( 
insert_pattern(nont(B,F,G) ,H) , 

findalKl, (nont_u(B,F,G, J) ,1 = (clause (H, J) )) ,K) , 
pp(K) , f ind_pattern(nont(B,C,D) ,E) 
)). 

nont_u(B,C,D, ' , ' (E,F)) : - t_u(a,C,G,E) , nont_m(B,G,D,F) . 
nont_u(H,I,J,K) :- t_u(H,I, J,K) . 
t_u(L, [L I M] ,M,true) . 
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The generating extension is usually executed using nont_m, whereby the last argu- 
ment is instantated to the filtered version of the call under consideration. E.g., to 
specialise the original program for nont(c,T,R) we call nont_m(c,T,R,FCall) , which 
instantiates FCall to nont__0(T,R) and prints the following residual program: 

nont__0( [a |B] ,C) :- 

nont 0(B,C) . 

nont 0([c|D] ,D) . 

Observe that we can use the computed answer substitution for FCall to produce 
an interface definition clause: 

nont(c,T,R) :- nont__0(T,R) . 

This will be done automatically by the logen system when it produces the 
specialised program. 

Some other examples which can be handled by simple divisions (i.e., using just 
the binding-types static and dynamic), such as an interpreter for the ground repre- 
sentation (where the overhead is compiled away) and a "special" regular expression 
parser from (Mogensen and Bondorf 1992) (where we obtain deterministic automa- 
ton after specialisation) can be found in (J0rgensen and Leuschel 1996). 

C The Transpose Example 

A possible annotated program of the transpose benchmark program for matrix 
transposition looks like: 

static_consult ( □ ) . 
residual(transpose(A,B) ) . 

f ilter (transpose (A, B) , [type (list (type (list (dynamic) )) ) .dynamic] ) . 
ann_clause(l ,transpose(A, [] ) ,unf old(nullrows (A) ) ) . 
ann_clause (2 , transpose (A, [B I C] ) , 

(unf old(makerow(A,B ,D) ) , unfold (transpose (D ,C) ) ) ) . 
f ilter (makerow(A,B , C) , [type (list (type (list (dynamic) ) ) ) .dynamic .dynamic] ) . 
ann_clause(3,makerow( [] ,[],[]) ,true) . 

ann_clause(4,makerow( [ [A I B] I C] , [A I D] , [B I E] ) ,unf old(makerow(C,D,E) ) ) . 
f ilter(nullrows(A) , [type (list (type (list (dynamic) )))]). 
ann_clause(5,nullrows( [] ) ,true) . 

ann_clause(6,nullrows( [ [] I A] ) , unfold (nullrows( A) ) ) . 

In the above we stipulate that the first argument to transpose will be of type 
list(list(dynamic)) , i.e., a list skeleton whose elements are in turn list skeletons (in 
other words we have a matrix skeleton, without the actual matrix elements). The 
generating extension produced by cogen then looks like this: 

/* file: bench/transpose. gx */ 

/* */ 

/* GENERATING EXTENSION */ 

/* */ 

:- logen_reconsult ( 'memo' ) • 
:- logen_reconsult('pp') . 
transpose_m(B,C,D) :- 

(( f ind_pattern(transpose(B,C) ,D) 
) -> ( true ) ; ( 
generalise (transpose (B,C) ,E) , add_extra_argument ( [95 ,117] ,E,F,G) , 
insert_pattern(E,H) , findall(I, (G,I = (clause (H,F) ) ) , J) , 
pp(J) , f ind_pattern (transpose (B,C) ,D) 
)). 
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transpose_u(B, [] ,C) :- nullrows_u(B,C) . 

transpose_u(D, [E|F] , ' , ' (G,H)) :- makerow_u(D,E,I,G) , transpose_u(I,F,H) . 
makerow_u( [],[], [] ,true) . 

makerow_u([[J|K] |L] , [JIM] , [K|N] ,0) :- maker ow_u(L,M,N,0) . 
nullrows_u( [] ,true) . 

nullrows_u( [[] IP] ,Q) :- nullrows_u(P , Q) . 

Running the generating extension for transpose ( [[a, b] , [c,d]] ,R) leads to the fol- 
lowing specialised program (and full unfolding has been achieved): 

transpose( [ [a,b] , [c,d]] , A) :- transpose 0(a,b,c,d,A) . 

transpose__0(B,C,D,E, [ [B ,D] , [C,E]]) . 

For the particular dppd benchmark query used in Section 5 we actually had to 
use a sligthly more refined division: 

f ilter(transpose(A,B) , [ ( struct ( '[]',[] ) ; 

struct ( ' . ' , [type (list (dynamic) ) , type (list (dynamic) ) ] ) ) .dynamic] ) . 
f ilter (makerow(A,B , C) , [type (list (type (list (dynamic) ) ) ) .dynamic .dynamic] ) . 

The above corresponds to giving the first argument of transpose the following 
binding-type (i.e., a list skeleton where only the first argument itself is also a list 
skeleton): 

:- type argl — > [] ; [list (dynamic) I list (dynamic)] . 

Using this division, the specialised program for transpose ( [[a.b] , [c,d]] ,R) is: 

transpose ([ [a, b] , [c ,d] ], A) :- transpose 0(a,b, [c,d] , A) . 

transpose__0(B,C, [D,E] , [[B,D] , [C,E]] ) . 



